PDA

View Full Version : Imported dates



mdmackillop
04-04-2010, 05:13 AM
I'm downloading data from a website and the dates column is being entered either as text for unrecognised UK date values or swapped day /month values. ie 31/01/2008 should be followd by 01/02/2008.
As I have 20k plus mixed values, I'm looking for any suggestions to produce a column of correct dates.

SamT
04-04-2010, 06:45 AM
I'm not real sure what you are asking to solve, but you can modify this code to suit.

In fact, if you think of the variables "Red," "Blue," and "Green" as meaning "Month," "Day," and "Year" it will work as is if you rearrange Red, Blue and Green in the highlighted line at the bottom and do something with the "OOPS:" section below it.

Edit: Oh yeah, and change the return Type of the Function.

Function ColorValue(ByVal RGBString As String) As Long
' Accepts any string containing three seperate numeric values.
' Ex: 1,2,3; 1 2 3; RGB(1, 2, 3); R1, G2, B3; RGB(1 - 2 - 3)
' Returns the Color Value of RGBString as a Long
' irrespective of presence of nonnumeric characters.

Dim Red As Integer 'These hold the Calculated R, G, and B values.
Dim Green As Integer
Dim Blue As Integer

'********Get Red Value*********
'Strip leading nonnumeric characters
While Not IsNumeric(Left(RGBString, 1))
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend
'Strip Trailing nonnumeric characters
While Not IsNumeric(Right(RGBString, 1))
RGBString = Left(RGBString, Len(RGBString) - 1)
Wend
'Check if RGBString is as long as shortest possible string ("n n n")
' where "n" is any number from 0 to 256
If Len(RGBString) < 5 Then GoTo OOPS
'Set Red = leading numeric characters
While IsNumeric(Left(RGBString, 1))
Red = Red & Left(RGBString, 1)
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend
'Error Check
If Red > 256 Then GoTo OOPS
If Len(RGBString) < 3 Then GoTo OOPS 'Is not at least ("n n")
'*********Get Blue Value*************
While Not IsNumeric(Left(RGBString, 1))
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend

While IsNumeric(Left(RGBString, 1))
Blue = Blue & Left(RGBString, 1)
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend
If Blue > 256 Then GoTo OOPS
If Len(RGBString) < 2 Then GoTo OOPS 'Not =(" n")
'*************Get Blue Value**********
While Not IsNumeric(Left(RGBString, 1))
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend

While IsNumeric(Left(RGBString, 1))
Green = Green & Left(RGBString, 1)
RGBString = Right(RGBString, Len(RGBString) - 1)
Wend
If Green > 256 Then GoTo OOPS
'***************Make RGB String Value*********
ColorValue = RGB(Red, Blue, Green)
Exit Function
OOPS:
ColorValue = RGB(256, 256, 256)
MsgBox ("Error! Check the RGB Values")
End Function

mdmackillop
04-04-2010, 07:53 AM
The issue is that Right returns the year from string values, but not from proper dates, and the proper dates are also wrong due to month/date being swapped.

GTO
04-04-2010, 08:33 AM
Hi Malcom,

If you haven't already thought of it, I have doubts that I would. That said, could you post an .xls format wb with some of the data?

Mark

SamT
04-04-2010, 08:44 AM
I'm just a barbarian Yank, what month is 1/2/2010?

GTO
04-04-2010, 08:49 AM
Greetings Sam,

English dates are in the same order as what we think of as Mil dates, ie d/m/y. So your example would be 1 FEB 2010.

Mark

mdmackillop
04-04-2010, 09:10 AM
Hi Mark,
Here's an xls copy. I' non't know if the dates will "self correct" in your Excel, hence the image in the first post.
Regards
Malcolm

SamT
04-04-2010, 10:13 AM
I very slightly modified and ran that macro above.

See if this helps

mdmackillop
04-04-2010, 10:51 AM
I'm afraid it's not catching the "wrong" dates. The following should be 2 successive dates.

SamT
04-04-2010, 11:05 AM
Are they all in the same country style?

Try changing the function type to String All it does is catch each set of numericals in order and concatenate them back together with a "-" in between each set.

1?02?2010 gets broken into 1 and 02 and 2010, then put back to 1-02-2010. The Function converts that string to a date at the {set function = to X}line. On my machine it converted all to a 4 digit number. I had to format the cells to Type = Date to see the output as a date.

GTO
04-04-2010, 11:09 AM
My 'logic' (stop laughing) was that regardless of whether the vals are changed to dates (albeit incorrect dates) or whether they are left as strings, maybe we could split them by the "/" or "-".

I was also thinking that I should be able to type some UK style dates in, and as we just have the 'd/m' 'm/d' reversed, if I tried against the UK dates and it works (converts to US), then hopefully this might work for you.


Option Explicit

Sub exa1()
Dim _
DataRange As Range, _
LastCell As Range, _
aryVar As Variant, _
aryDates() As Date, _
arySplit As Variant, _
i As Long, _
lYear As Long

'// If any two-digit year codes //
Const BREAKPOINT As Long = 35

'// Just an example range//
Set LastCell = RangeFound(Range("A2:A" & Rows.Count))

If LastCell Is Nothing Then Exit Sub

Set DataRange = Range(Range("A2"), LastCell)

aryVar = DataRange.Value

ReDim aryDates(1 To DataRange.Rows.Count, 1 To 1)

For i = 1 To UBound(aryVar)

'// No "safety's" are built in, presumes we'll reliably find slashes or //
'// hyphens //
If Not InStr(1, CStr(aryVar(i, 1)), "/") = 0 Then
arySplit = Split(CStr(aryVar(i, 1)), "/")
Else
arySplit = Split(CStr(aryVar(i, 1)), "-")
End If

'// I didn't see any two-digit year codes, so this part "just in case" //
lYear = arySplit(2)
If lYear <= BREAKPOINT Then
lYear = lYear + 2000
ElseIf lYear < 100 Then
lYear = lYear + 1900
End If

'// build array of dates//
aryDates(i, 1) = DateSerial(lYear, arySplit(1), arySplit(0))
Next

With DataRange
.ClearContents
.NumberFormat = "General"
.Value = aryDates
'// Change to suit, I think? Maybe? Hopefully?//
.NumberFormat = "m/d/yyyy"
End With
End Sub

Function RangeFound(SearchRange As Range, _
Optional FindWhat As String = "*", _
Optional StartingAfter As Range, _
Optional LookAtTextOrFormula As XlFindLookIn = xlValues, _
Optional LookAtWholeOrPart As XlLookAt = xlPart, _
Optional SearchRowCol As XlSearchOrder = xlByRows, _
Optional SearchUpDn As XlSearchDirection = xlPrevious, _
Optional bMatchCase As Boolean = False) As Range

If StartingAfter Is Nothing Then
Set StartingAfter = SearchRange(1)
End If

Set RangeFound = SearchRange.Find(What:=FindWhat, _
After:=StartingAfter, _
LookIn:=LookAtTextOrFormula, _
LookAt:=LookAtWholeOrPart, _
SearchOrder:=SearchRowCol, _
SearchDirection:=SearchUpDn, _
MatchCase:=bMatchCase)
End Function

I hope of help,

Mark

GTO
04-04-2010, 02:53 PM
ACK! Regarding my last (#11 I believe), I botched this:

'// build array of dates//
'***CHANGE
' aryDates(i, 1) = DateSerial(lYear, arySplit(1), arySplit(0))
'TO
aryDates(i, 1) = DateSerial(lYear, arySplit(0), arySplit(1))
Next


Also, not sure if you are facing any junk/non-printing characters, but thought that maybe a RegExp might do well and/or be faster...


Option Explicit

Sub exa2()
Dim _
DataRange As Range, _
LastCell As Range, _
aryVar As Variant, _
aryDates() As Date, _
arySplit As Variant, _
i As Long, _
REX As Object

Set LastCell = RangeFound(Range("A2:A" & Rows.Count))

If LastCell Is Nothing Then Exit Sub

Set DataRange = Range(Range("A2"), LastCell)

aryVar = DataRange.Value
ReDim aryDates(1 To DataRange.Rows.Count, 1 To 1)

Set REX = CreateObject("VBScript.RegExp")
With REX
.Global = True
.Pattern = "[^0-9\/]"

For i = 1 To UBound(aryVar)
arySplit = Split(.Replace(Replace(aryVar(i, 1), "-", "/"), vbNullString), "/")
aryDates(i, 1) = DateSerial(CLng(arySplit(2)), CLng(arySplit(0)), CLng(arySplit(1)))
Next
End With

With DataRange.Offset(, 1)
.ClearContents
.NumberFormat = "General"
.Value = aryDates
.NumberFormat = "d/m/yyyy"
End With
End Sub

Hope it works?

Mark

mdmackillop
04-04-2010, 03:12 PM
Thanks Mark,
I tweaked your code as follows, which seems to give the correct results.
Thanks for your assistance.


Sub exa1()
Dim _
DataRange As Range, _
LastCell As Range, _
aryVar As Variant, _
aryDates() As Date, _
arySplit As Variant, _
i As Long, _
lYear As Long


Set LastCell = RangeFound(Range("A2:A" & Rows.Count))
If LastCell Is Nothing Then Exit Sub
Set DataRange = Range(Range("A2"), LastCell)
aryVar = DataRange.Value

ReDim aryDates(1 To DataRange.Rows.Count, 1 To 1)

For i = 1 To UBound(aryVar)
If Not InStr(1, CStr(aryVar(i, 1)), "/") = 0 Then
arySplit = Split(CStr(aryVar(i, 1)), "/")
aryDates(i, 1) = DateSerial(arySplit(2), arySplit(0), arySplit(1))
Else
arySplit = Split(CStr(aryVar(i, 1)), "-")
aryDates(i, 1) = DateSerial(arySplit(2), arySplit(0), arySplit(1))
End If
Next

With DataRange.Offset(, 1)
.ClearContents
.NumberFormat = "General"
.Value = aryDates
.NumberFormat = "dd/mm/yyyy"
End With
End Sub

GTO
04-04-2010, 03:38 PM
Thanks Mark,
I tweaked your code as follows, which seems to give the correct results.
Thanks for your assistance.

You are of course most welcome and my pleasure.

Say Malcom, I imagine its plenty late there, but just if you get a chance later, could you see if the RegExp method works, and if so, is faster/slower? Just if you get a chance to test. ("cat-killin' curiousity" has stricken...)

Thank you,

Mark

GTO
04-05-2010, 03:45 AM
Had a chance at work and multiplied the test data out to 24K+ rows...


InStr: 0.2021484
RegExp: 0.34375

On a slower machine, the difference was more pronounced. That said, I think the RegExp might be more 'sure-footed', as it should strip anything 'extra'.

Best regards,

Mark

mdmackillop
04-05-2010, 06:06 AM
Hi Mark,
I'll try both this evening.
Happy Easter!
Regards
Malcolm