PDA

View Full Version : Extracting Data from a TXT/XML file



swaggerbox
05-23-2011, 05:04 AM
I want to extract the contents in FTR 2 until the content of the last FTR tag (designated as /FEATURE in Cell B20. I also want to extract the contents of FTR 1 in Cell B19 using the code below with changes made to the pattern.

The code below aims to do just that for FTR 2 but it is encountering an Invalid procedure call or argument in the line "ActiveSheet.Range("B20").Value = RegM(0)" . When I change the pattern to ".Pattern = "(<FTR>2.)(.+)(/FTR>)", it does not generate an error but extracts only the contents of FTR 2. I also want to extract the contents of FTR 3, FTR 4, and so on. When I check the source file (it's an XML file saved in TXT), there is a Hard Return character which I suspect causes the non-extraction. When I removed this character, it extracts the contents correctly but my problem now is that this modified source file causes the macro for FTR 1 to extract not only FTR 1 in Cell B19 but also the contents in FTR 2 and beyond.

I simply want to extract the contents in FTR 1 in B19, and the contents of FTR 2, FTR 3... FTR n??? in B20.

I have attached a sample file.

Anyone got ideas?



Sub ExtractFTR2_and_Beyond()
Dim wsh As Object
Dim RegEx As Object, RegM As Object
Dim FSO As Object, Fil As Object
Dim ts As Object, txtAll As String, xFil As String
Set FSO = CreateObject("Scripting.FileSystemObject")
Set RegEx = CreateObject("vbscript.regexp")

xFil = Range("A4").Value

With RegEx
.Pattern = "(<FTR>2.)(.+)(/FEATURE>)"

.Global = False
End With
Set Fil = FSO.GetFile(xFil)

Set ts = Fil.OpenAsTextStream(1)
txtAll = ts.ReadAll
Set RegM = RegEx.Execute(txtAll)

ActiveSheet.Range("B20").Value = RegM(0)

ts.Close


Set ts = Nothing
Set wsh = Nothing

Set FSO = Nothing
Set RegM = Nothing
Set RegEx = Nothing
End Sub