PDA

View Full Version : [SOLVED:] Restructuring data from html output to excel worksheet with vba



Beatrix
01-22-2024, 05:28 PM
Hi All,

I copied data from HTML document to Excel worksheet however data is not structured to use for insight. I need to apply to a list of rules to redesign the data structure in Excel. I would like to use vba to automate this as the row level data could be massive in some cases. I summarised the rules in attached doc with before/after HTML output but I don't know if possible to achieve all in one vba code, if it is possible then do I need to follow a specific order for the steps? I was wondering if anyone could help me with this?

Many Thanks
B.

jdelano
01-23-2024, 05:23 AM
Here is one possible way of doing it, you'll need to add the formatting you want the restructure to have.
Also, OnStart was listed twice in App in your example, you should be able add in whatever rule you need for that.

I've attached the file that I did this in. I hope this gets you started well enough.




Private Sub btnRestructure_Click()

' constants for the columns to write to
Const colScreenName = 1
Const colControl = 3
Const colProperty = 4
Const colPreviousFX = 5
Const colCurrentFX = 6
Const colControlType = 2
Const colSingleFXUniqueControl = 7
Const colCountOfSingleFXCurrentFX = 8

' vars to hold stuff
Dim fromSheetName As String
Dim toSheetName As String

Dim fromWSEndingRow As Integer

Dim currentScreenName As String
Dim currentControlType As String
Dim currentControlName As String

Dim fromWSCurrentRow As Integer
Dim toWSCurrentRow As Integer

Dim mergedCellRow As Integer
Dim mergedCellRowCount As Integer

Dim fromWS As Worksheet
Dim toWS As Worksheet
Dim mergedCell As Range

' put the stuff in the vars
fromSheetName = "HTML_Before"
toSheetName = "HTML_After2"

Set fromWS = ThisWorkbook.Worksheets(fromSheetName)
Set toWS = ThisWorkbook.Worksheets(toSheetName)

fromWS.Activate ' you have to activate the sheet to select the merged cell to then get the rows it spans


fromWSCurrentRow = 1 ' set the current row to where the loop should start reading
fromWSEndingRow = fromWS.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

toWSCurrentRow = 3 ' set the current row to where the loop should start writing

currentScreenName = ""
currentControlType = ""
currentControlName = ""

' loops the sheet with the input data in it
Do While fromWSCurrentRow <= fromWSEndingRow
Set mergedCell = fromWS.Cells(fromWSCurrentRow, 1)
mergedCell.Select
mergedCellRowCount = Selection.Rows.Count

currentScreenName = mergedCell.Value

' unless the screenName isn't App, grab the other descriptors
If currentScreenName <> "App" And currentScreenName <> "" Then
' set controlName
currentControlName = fromWS.Cells(fromWSCurrentRow, 2).Value

' no control name is needed if it matches the screenName
If currentScreenName = currentControlName Then
currentControlName = ""

' write the top row of the screen section
toWS.Cells(toWSCurrentRow, colScreenName).Value = currentScreenName
toWS.Cells(toWSCurrentRow, colControlType).Value = ""
toWS.Cells(toWSCurrentRow, colControl).Value = ""
toWS.Cells(toWSCurrentRow, colProperty).Value = fromWS.Cells(fromWSCurrentRow, 3).Value
toWS.Cells(toWSCurrentRow, colPreviousFX).Value = fromWS.Cells(fromWSCurrentRow, 4).Value
toWS.Cells(toWSCurrentRow, colCurrentFX).Value = fromWS.Cells(fromWSCurrentRow, 5).Value
toWS.Cells(toWSCurrentRow, colSingleFXUniqueControl).Value = ""
toWS.Cells(toWSCurrentRow, colCountOfSingleFXCurrentFX).Value = ""

toWSCurrentRow = toWSCurrentRow + 1

fromWSCurrentRow = fromWSCurrentRow + 1
End If


End If

For mergedCellRow = fromWSCurrentRow To (fromWSCurrentRow + mergedCellRowCount) - 2
currentControlName = fromWS.Cells(mergedCellRow, 2).Value

' set controlType - take the left characters of the name up to the first '_'
If InStr(currentControlName, "_") > 0 Then
currentControlType = Left(currentControlName, InStr(currentControlName, "_") - 1)
End If

toWS.Cells(toWSCurrentRow, colScreenName).Value = currentScreenName
toWS.Cells(toWSCurrentRow, colControlType).Value = currentControlType
toWS.Cells(toWSCurrentRow, colControl).Value = currentControlName
toWS.Cells(toWSCurrentRow, colProperty).Value = fromWS.Cells(mergedCellRow, 3).Value
toWS.Cells(toWSCurrentRow, colPreviousFX).Value = fromWS.Cells(mergedCellRow, 4).Value
toWS.Cells(toWSCurrentRow, colCurrentFX).Value = fromWS.Cells(mergedCellRow, 5).Value
toWS.Cells(toWSCurrentRow, colSingleFXUniqueControl).Value = ""
toWS.Cells(toWSCurrentRow, colCountOfSingleFXCurrentFX).Value = ""

toWSCurrentRow = toWSCurrentRow + 1
Next mergedCellRow

' move the row counter in the from worksheet equal to the rows used by the merged cell
fromWSCurrentRow = fromWSCurrentRow + mergedCellRowCount - IIf(currentScreenName = "App", 0, 1)

Loop


End Sub

Beatrix
01-24-2024, 02:40 PM
Thank you jdelano, the comments you made in your script are really useful to understand each step/action, definitely helped me to start.

Much appreciated
B.




Here is one possible way of doing it, you'll need to add the formatting you want the restructure to have.
Also, OnStart was listed twice in App in your example, you should be able add in whatever rule you need for that.

I've attached the file that I did this in. I hope this gets you started well enough.




Private Sub btnRestructure_Click()

' constants for the columns to write to
Const colScreenName = 1
Const colControl = 3
Const colProperty = 4
Const colPreviousFX = 5
Const colCurrentFX = 6
Const colControlType = 2
Const colSingleFXUniqueControl = 7
Const colCountOfSingleFXCurrentFX = 8

' vars to hold stuff
Dim fromSheetName As String
Dim toSheetName As String

Dim fromWSEndingRow As Integer

Dim currentScreenName As String
Dim currentControlType As String
Dim currentControlName As String

Dim fromWSCurrentRow As Integer
Dim toWSCurrentRow As Integer

Dim mergedCellRow As Integer
Dim mergedCellRowCount As Integer

Dim fromWS As Worksheet
Dim toWS As Worksheet
Dim mergedCell As Range

' put the stuff in the vars
fromSheetName = "HTML_Before"
toSheetName = "HTML_After2"

Set fromWS = ThisWorkbook.Worksheets(fromSheetName)
Set toWS = ThisWorkbook.Worksheets(toSheetName)

fromWS.Activate ' you have to activate the sheet to select the merged cell to then get the rows it spans


fromWSCurrentRow = 1 ' set the current row to where the loop should start reading
fromWSEndingRow = fromWS.Cells.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row

toWSCurrentRow = 3 ' set the current row to where the loop should start writing

currentScreenName = ""
currentControlType = ""
currentControlName = ""

' loops the sheet with the input data in it
Do While fromWSCurrentRow <= fromWSEndingRow
Set mergedCell = fromWS.Cells(fromWSCurrentRow, 1)
mergedCell.Select
mergedCellRowCount = Selection.Rows.Count

currentScreenName = mergedCell.Value

' unless the screenName isn't App, grab the other descriptors
If currentScreenName <> "App" And currentScreenName <> "" Then
' set controlName
currentControlName = fromWS.Cells(fromWSCurrentRow, 2).Value

' no control name is needed if it matches the screenName
If currentScreenName = currentControlName Then
currentControlName = ""

' write the top row of the screen section
toWS.Cells(toWSCurrentRow, colScreenName).Value = currentScreenName
toWS.Cells(toWSCurrentRow, colControlType).Value = ""
toWS.Cells(toWSCurrentRow, colControl).Value = ""
toWS.Cells(toWSCurrentRow, colProperty).Value = fromWS.Cells(fromWSCurrentRow, 3).Value
toWS.Cells(toWSCurrentRow, colPreviousFX).Value = fromWS.Cells(fromWSCurrentRow, 4).Value
toWS.Cells(toWSCurrentRow, colCurrentFX).Value = fromWS.Cells(fromWSCurrentRow, 5).Value
toWS.Cells(toWSCurrentRow, colSingleFXUniqueControl).Value = ""
toWS.Cells(toWSCurrentRow, colCountOfSingleFXCurrentFX).Value = ""

toWSCurrentRow = toWSCurrentRow + 1

fromWSCurrentRow = fromWSCurrentRow + 1
End If


End If

For mergedCellRow = fromWSCurrentRow To (fromWSCurrentRow + mergedCellRowCount) - 2
currentControlName = fromWS.Cells(mergedCellRow, 2).Value

' set controlType - take the left characters of the name up to the first '_'
If InStr(currentControlName, "_") > 0 Then
currentControlType = Left(currentControlName, InStr(currentControlName, "_") - 1)
End If

toWS.Cells(toWSCurrentRow, colScreenName).Value = currentScreenName
toWS.Cells(toWSCurrentRow, colControlType).Value = currentControlType
toWS.Cells(toWSCurrentRow, colControl).Value = currentControlName
toWS.Cells(toWSCurrentRow, colProperty).Value = fromWS.Cells(mergedCellRow, 3).Value
toWS.Cells(toWSCurrentRow, colPreviousFX).Value = fromWS.Cells(mergedCellRow, 4).Value
toWS.Cells(toWSCurrentRow, colCurrentFX).Value = fromWS.Cells(mergedCellRow, 5).Value
toWS.Cells(toWSCurrentRow, colSingleFXUniqueControl).Value = ""
toWS.Cells(toWSCurrentRow, colCountOfSingleFXCurrentFX).Value = ""

toWSCurrentRow = toWSCurrentRow + 1
Next mergedCellRow

' move the row counter in the from worksheet equal to the rows used by the merged cell
fromWSCurrentRow = fromWSCurrentRow + mergedCellRowCount - IIf(currentScreenName = "App", 0, 1)

Loop


End Sub

jdelano
01-24-2024, 03:02 PM
You're welcome. Happy to help. Good luck with your project.

Beatrix
02-09-2024, 12:30 PM
Hi jdelano,

I was wondering if you could help me with the part to repeat Current Control Name for each row ? I spent a couple of days with trial and error method but I couldn't figure it out which part in your script I need to change to achieve that:( I tried to replicate Do While loop which works for screen name However it didn't work for Control Name :think:
31336


Your help is much appreciated.
Thanks
B.

jdelano
02-09-2024, 01:07 PM
Sure, can you upload the Excel file you have, and I'll look it over.

Beatrix
02-09-2024, 01:43 PM
Thank you so much, please see attached VBA_question. Test tab is the output after your script runs:)

jdelano
02-10-2024, 12:41 AM
Give this change a go, I modified the section that sets the control name in the for mergedCellRow loop



' don't assign a blank to the control name as it needs to repeat
If fromWS.Cells(mergedCellRow, 2).Value <> "" Then
currentControlName = fromWS.Cells(mergedCellRow, 2).Value

' set controlType - take the left characters of the name up to the first '_'
If InStr(currentControlName, "_") > 0 Then
currentControlType = Left(currentControlName, InStr(currentControlName, "_") - 1)
End If
End If




edit: forgot the screenshot

Sorry about that.

Beatrix
02-11-2024, 04:43 PM
Thank you very much jdelano, that worked:yes