Consulting

Results 1 to 6 of 6

Thread: Extracting text from textbox in header/footer/Save html with header/footer

  1. #1
    VBAX Newbie
    Joined
    Jul 2021
    Posts
    3
    Location

    Extracting text from textbox in header/footer/Save html with header/footer

    Hello, I'm searching the internet to find a solution to copy/move the text from headers and footers into the correct section or to save the html with headers and footers. I do not have any knowledge in VBA, only simple replaces or template modifications.


    What I'm trying to do is to convert some PDF to DOC then HTML, but when converting from ACROBAT to word, the same text on different pages remains into the header or footer, and when I convert to html I loose that text, then I must search each page in doc to see what text was in the header or footer, each day day I have 30 PDFs with 32 pages at least, and an archive with over 100.000 pdfs, and it's a terrible task. I've tried to find a solution for this problem for years, and for the past week all I do is read information on different sites to understand what can I do.


    Even if I seen some codes on the internet that worked for others, for me they were not good, even though I tried to adapt them but when you do not have the necessary knowledge it's impossible.


    What I learned from my docs is that in the header the text is in a text box. What I managed to do until now is the access the text from each page (but it looses the styles/characters and that is not good) and access the headers, but for each page it shows me all the headers, because something I did is not good. I'm trying to find a solution for each page to put the header in front of the text (with the style from the header) and for the footer at the end of text (with the style from the footer).


    Please help me with a solution if there is one


    Sub testPagini2()Dim oSection As Section
    Dim oHeader As HeaderFooter
    Dim Shp As Shape, StrTmp As String, testStr As String
    Dim textPagina As String
    Dim numarShape As Integer
    
    
    
    
    For Each oSection In ActiveDocument.Sections
    
    
    
    
        textPagina = oSection.Range.text
        numarShape = ActiveDocument.Sections(oSection.Index).Headers(wdHeaderFooterPrimary).Shapes.Count
        MsgBox "Pagina " & oSection.Index & " Numar shape " & numarShape & ": " & textPagina
     
            For Each Shp In ActiveDocument.Sections(oSection.Index).Headers(wdHeaderFooterPrimary).Range.ShapeRange
              
              With Shp
                  StrTmp = "Name: " & .Name & vbCr & _
                  "Sus: " & .Top & vbCr & _
                  "Stanga: " & .Left & vbCr & _
                  "Inaltime: " & .Height & vbCr & _
                  "Latime: " & .Width & vbCr
                If .TextFrame.HasText = True Then
                  StrTmp = StrTmp & "Textul din shape: " & .TextFrame.TextRange.text
                  'End If
                  MsgBox StrTmp
                End If
                
              End With
            Next
    Next
    
    End Sub

  2. #2
    VBAX Regular
    Joined
    Jul 2020
    Location
    Sun Prairie
    Posts
    46
    Location
    I could be way off here, but it is my understanding that html does not have headers, footers, or pages.

  3. #3
    VBAX Newbie
    Joined
    Jul 2021
    Posts
    3
    Location
    Yes, html does not have that, doc or docx does, and when i convert them to html i loose that text (which is not really a header of footer, it's just a simple repetitive text on the beginning of different pages that was converted to header or footer from pdf to doc by Acrobat).

  4. #4
    VBAX Regular
    Joined
    Jul 2020
    Location
    Sun Prairie
    Posts
    46
    Location
    Quote Originally Posted by iorasuke View Post
    it's just a simple repetitive text on the beginning of different pages
    That is the definition of a header.

  5. #5
    There are no pages in an HTML document?
    Graham Mayor - MS MVP (Word) 2002-2019
    Visit my web site for more programming tips and ready made processes
    http://www.gmayor.com

  6. #6
    VBAX Newbie
    Joined
    Jul 2021
    Posts
    3
    Location
    Reading other topics on this forum this is the solution that I have to work on all pages. But it's still not the best solution for me. In my 32 page document, converting the pdf from acrobat to doc, for example only page 19, 20 and 23, 24 has text that i need to extract from header, other pages have no text in headers.
    When I use the code on page 19 I have "TextHeader 20, TextHeader 19 and the content from page 19", and on page 20 I only have the content, the same thing happens for 23 and 24, clearly something is not good in the code or I tried to understand the difference in section marks some are Section (nextpage) others Section (continous), if I replace the before the code, everything gets messed.


    Sub ExtractFromHeader()
    Dim oStory As Range
    Dim oRngAnchor As Range
    Dim sShape As Shape
    Dim strText As String
    Dim i As Integer
    Dim oSection As Section
    Dim oHeader As HeaderFooter
    
    
        For Each oStory In ActiveDocument.StoryRanges
            For Each oSection In ActiveDocument.Sections
                For Each oHeader In oSection.Headers
                    If oHeader.Exists Then
                        For i = oHeader.Range.ShapeRange.Count To 1 Step -1
                            Set sShape = oHeader.Range.ShapeRange(i)
                            If sShape.TextFrame.HasText Then
    
    
                                    Set oRngAnchor = sShape.Anchor.Paragraphs(1).Range
                                    oSection.Range.InsertBefore "TextHeader << " & sShape.TextFrame.TextRange.text & " >>"
                                    sShape.Delete
    
    
                            End If
                        Next i
                    End If
                Next oHeader
                For Each oHeader In oSection.Footers
                    If oHeader.Exists Then
                        For i = oHeader.Range.ShapeRange.Count To 1 Step -1
                            Set sShape = oHeader.Range.ShapeRange(i)
                            If sShape.TextFrame.HasText Then
    
    
                                    Set oRngAnchor = sShape.Anchor.Paragraphs(1).Range
                                    oSection.Range.InsertAfter "TextFooter << " & sShape.TextFrame.TextRange.text & " >>"
                                    sShape.Delete
    
    
                            End If
                        Next i
                    End If
                Next oHeader
                
            Next oSection
        Next oStory
    lbl_Exit:
        Set oStory = Nothing
        Set sShape = Nothing
        
        Exit Sub
    End Sub

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •