PDA

View Full Version : Extracting text from textbox in header/footer/Save html with header/footer



iorasuke
07-16-2021, 12:50 AM
Hello, I'm searching the internet to find a solution to copy/move the text from headers and footers into the correct section or to save the html with headers and footers. I do not have any knowledge in VBA, only simple replaces or template modifications.


What I'm trying to do is to convert some PDF to DOC then HTML, but when converting from ACROBAT to word, the same text on different pages remains into the header or footer, and when I convert to html I loose that text, then I must search each page in doc to see what text was in the header or footer, each day day I have 30 PDFs with 32 pages at least, and an archive with over 100.000 pdfs, and it's a terrible task. I've tried to find a solution for this problem for years, and for the past week all I do is read information on different sites to understand what can I do.


Even if I seen some codes on the internet that worked for others, for me they were not good, even though I tried to adapt them but when you do not have the necessary knowledge it's impossible.


What I learned from my docs is that in the header the text is in a text box. What I managed to do until now is the access the text from each page (but it looses the styles/characters and that is not good) and access the headers, but for each page it shows me all the headers, because something I did is not good. I'm trying to find a solution for each page to put the header in front of the text (with the style from the header) and for the footer at the end of text (with the style from the footer).


Please help me with a solution if there is one



Sub testPagini2()Dim oSection As Section
Dim oHeader As HeaderFooter
Dim Shp As Shape, StrTmp As String, testStr As String
Dim textPagina As String
Dim numarShape As Integer




For Each oSection In ActiveDocument.Sections




textPagina = oSection.Range.text
numarShape = ActiveDocument.Sections(oSection.Index).Headers(wdHeaderFooterPrimary).Shap es.Count
MsgBox "Pagina " & oSection.Index & " Numar shape " & numarShape & ": " & textPagina

For Each Shp In ActiveDocument.Sections(oSection.Index).Headers(wdHeaderFooterPrimary).Rang e.ShapeRange

With Shp
StrTmp = "Name: " & .Name & vbCr & _
"Sus: " & .Top & vbCr & _
"Stanga: " & .Left & vbCr & _
"Inaltime: " & .Height & vbCr & _
"Latime: " & .Width & vbCr
If .TextFrame.HasText = True Then
StrTmp = StrTmp & "Textul din shape: " & .TextFrame.TextRange.text
'End If
MsgBox StrTmp
End If

End With
Next
Next

End Sub

Chas Kenyon
07-22-2021, 03:40 PM
I could be way off here, but it is my understanding that html does not have headers, footers, or pages.

iorasuke
08-10-2021, 11:26 PM
Yes, html does not have that, doc or docx does, and when i convert them to html i loose that text (which is not really a header of footer, it's just a simple repetitive text on the beginning of different pages that was converted to header or footer from pdf to doc by Acrobat).

Chas Kenyon
08-11-2021, 10:51 AM
it's just a simple repetitive text on the beginning of different pages

That is the definition of a header.

gmayor
08-11-2021, 08:54 PM
There are no pages in an HTML document?

iorasuke
08-15-2021, 11:29 PM
Reading other topics on this forum this is the solution that I have to work on all pages. But it's still not the best solution for me. In my 32 page document, converting the pdf from acrobat to doc, for example only page 19, 20 and 23, 24 has text that i need to extract from header, other pages have no text in headers.
When I use the code on page 19 I have "TextHeader 20, TextHeader 19 and the content from page 19", and on page 20 I only have the content, the same thing happens for 23 and 24, clearly something is not good in the code or I tried to understand the difference in section marks some are Section (nextpage) others Section (continous), if I replace the before the code, everything gets messed.




Sub ExtractFromHeader()
Dim oStory As Range
Dim oRngAnchor As Range
Dim sShape As Shape
Dim strText As String
Dim i As Integer
Dim oSection As Section
Dim oHeader As HeaderFooter


For Each oStory In ActiveDocument.StoryRanges
For Each oSection In ActiveDocument.Sections
For Each oHeader In oSection.Headers
If oHeader.Exists Then
For i = oHeader.Range.ShapeRange.Count To 1 Step -1
Set sShape = oHeader.Range.ShapeRange(i)
If sShape.TextFrame.HasText Then


Set oRngAnchor = sShape.Anchor.Paragraphs(1).Range
oSection.Range.InsertBefore "TextHeader << " & sShape.TextFrame.TextRange.text & " >>"
sShape.Delete


End If
Next i
End If
Next oHeader
For Each oHeader In oSection.Footers
If oHeader.Exists Then
For i = oHeader.Range.ShapeRange.Count To 1 Step -1
Set sShape = oHeader.Range.ShapeRange(i)
If sShape.TextFrame.HasText Then


Set oRngAnchor = sShape.Anchor.Paragraphs(1).Range
oSection.Range.InsertAfter "TextFooter << " & sShape.TextFrame.TextRange.text & " >>"
sShape.Delete


End If
Next i
End If
Next oHeader

Next oSection
Next oStory
lbl_Exit:
Set oStory = Nothing
Set sShape = Nothing

Exit Sub
End Sub