PDA

View Full Version : [SOLVED:] Help with excel macro



jawa
05-03-2005, 08:31 PM
Hi am new to this forum and am not an experienced user of visual basic. I am trying to write a macro in excel that identifies points that are within a certain distance of the point. Points that are within a certain distance of the point are then assigned the same group number.

The macro I have written reads in a list of data points (x,y,z), calculates the distance from each point to all other points and writes into an array called ?neighbours? the row positions of all events that are with the search distance of each point. It then assigns a Cluster or group ID to each point within the search radius, for each data point. To be assigned the same cluster ID, the current macro requires all points to be within the search radius of each other. This is not what I want. I would like an event to be included in the group ID if it is within 5m of any event of the group. Does that make sense? For example a series of points in a line with each point 5m apart. All of this points should be given the same group ID however the current macro assigns separate IDs to each point set with the exception of the last two points in the list.

Any help is much appreciated. Also are there any ways to speed the process up?

Regards
JA



Sub Clustering()
NumberEvents = Worksheets("DataBase").Cells(2, 1).Value
RefDistance = Worksheets("DataBase").Cells(14, 9).Value
'Declare Arrays
Dim EventX(1 To 17000) As Single
Dim EventY(1 To 17000) As Single
Dim EventZ(1 To 17000) As Single
Dim Neighbours(1 To 17000, 1 To 17000) As Integer
'Empty Arrays
For i = 1 To NumberEvents
EventX(i) = 0
EventY(i) = 0
EventZ(i) = 0
Next i
For i = 1 To NumberEvents
For j = 1 To 100
Neighbours(i, j) = 0
Next j
Next i
'Store Data
For i = 1 To NumberEvents
EventX(i) = CSng(Worksheets("DataBase").Cells(i + 2, 2).Value)
EventY(i) = CSng(Worksheets("DataBase").Cells(i + 2, 3).Value)
EventZ(i) = CSng(Worksheets("DataBase").Cells(i + 2, 4).Value)
Next i
'1) Determine Event Neighbours
For i = 1 To NumberEvents
k = 1
For j = 1 To NumberEvents
DeltaX = EventX(i) - EventX(j)
DeltaY = EventY(i) - EventY(j)
DeltaZ = EventZ(i) - EventZ(j)
Distance = Sqr(((DeltaX) * (DeltaX)) + ((DeltaY) * (DeltaY)) + ((DeltaZ) * (DeltaZ)))
If Distance < RefDistance Then
Neighbours(i, k) = j
k = k + 1
End If
Next j
Next i
'2) Assign Cluster_ID
CID = 1
For i = 1 To NumberEvents
If Worksheets("DataBase").Cells(i, 5).Value = "" Then
For j = 1 To 300
If Neighbours(i, j) > 0 Then
k = Neighbours(i, j)
Worksheets("DataBase").Cells(k + 2, 5).Value = CID
End If
Next j
CID = CID + 1
Next i
End Sub

acw
05-03-2005, 09:16 PM
HI

The last section of the routine is missing an End If between the lines

CID = CID + 1

Next i


Also, can you provide a sample of data, with the expected output for that data? Not too much - maybe 10 items.

Tony

jawa
05-03-2005, 09:32 PM
Hi Tony,
Thanks for that. Here a list of data that should be grouped into two clusters.
JA


Event_IDXYZCluster ID10 110100121010513101010141010151510102016505002750505285050102950501521050502 02

acw
05-03-2005, 09:48 PM
Hi

Want to try that again? Maybe attach a spreadsheet.


Tony

To attach, go to the advanced reply, scroll down and there is a Manage Attachments under the additional Options section.


Tony

jawa
05-03-2005, 09:54 PM
Hi Tony,

There should be a spreadsheet attached

JA

acw
05-03-2005, 11:02 PM
JA

Using the example you have given, lets see if I have what you are trying to do right. If there is a distance beteen the first point (row 3) and the second point (row4) that is less than 6, it will get an id of 1. If there is then a distance between point 2 and point3 that is also less than 6, it will get the same id. This will continue until there is a difference between 2 consecutive rows that does not have a distance that is viable. Once it reaches the next one (in this case between rows 6 and 7) it will increment the number (now 2) and continue with this number until there is a gap.

Associated with this the 2 rows that achieve the result will have to get the same id. So when the first match is made, it will put the id of 1 against both rows 3 and 4.

Is this what you are trying to achieve???


Tony

JA

Further to the above, if this is what you are trying to do then try this.



Sub bbb()
RefDistance = Worksheets("DataBase").Cells(14, 9).Value
CID = 1
For i = 1 To Worksheets("DataBase").Cells(2, 1).Value - 1
If Sqr((Cells(i + 2, 2) - Cells(i + 3, 2)) ^ 2 + (Cells(i + 2, 3) - Cells(i + 3, 3)) ^ 2 + (Cells(i + 2, 4) - Cells(i + 3, 4)) ^ 2) < RefDistance Then
Cells(i + 2, 5) = CID
Cells(i + 3, 5) = CID
Else
CID = CID + 1
End If
Next i
End Sub



Tony

jawa
05-04-2005, 06:22 PM
Hi Tony,

Sought of, except in reality the points are not in order and may be randomly spaced throughout the list eg. Your macro does not work when the points are out of order. See attached spreadsheet.

Regards
John A

acw
05-04-2005, 07:07 PM
John

Is there any reason that the data can't be sorted before running the code?


Tony

jawa
05-04-2005, 10:20 PM
Hi Tony,

I don't think sorting will help.

Maybe you could find the closest point to a point and if it is within the search radius assign it the same ID (Say ID = 1). You then find the closest point to that point and if it satisfies the search distance give it ID 1. You do this until all points have an ID. You are finding the closest point for all points with the same ID and then assigning that ID if they satisfy the search criteria.

It is difficult to explain. I have attached a real data set, which I have manually grouped as an example of what I need to do.

Does this make sense?

Regards
JA

acw
05-04-2005, 10:46 PM
John

Is the correct output, and the correct clustering in columns G:J. Also, what is the event spacing used to get this result.


Tony

jawa
05-04-2005, 11:04 PM
Tony,
Event spacing <= 50 m
Input x,y,z data in columns C:E. Cluster results in column B. The information in columns G:J is the data sorted on Cluster ID.
John

acw
05-05-2005, 06:47 PM
John

Different approach. Try this one. When I ran it with that last set of data, I got a different numbering, but the grouping was the same. I don't believe that the numbering per se is important, as long as the grouping is correct. If this is not so, please let me know.

Test this and see how it goes.



Sub Clustering()
Application.ScreenUpdating = False
Range("e3:e" & Worksheets("DataBase").Cells(2, 1).Value + 2).ClearContents
RefDistance = Worksheets("DataBase").Cells(14, 9).Value
lastrow = Range("b65536").End(xlUp).Row
CID = 1
Range("e3") = CID
i = 2
While WorksheetFunction.CountIf(Range("e3:e" & lastrow), "") > 0
Range("b3:e" & lastrow).Sort key1:=Range("e3"), order1:=xlAscending
Cells(i + 1, 5).Select
i = ActiveCell.Row
countmatch = 0
For j = ActiveCell.Row + 1 To lastrow
If Sqr((Cells(i, 2) - Cells(j, 2)) ^ 2 + (Cells(i, 3) - Cells(j, 3)) ^ 2 + (Cells(i, 4) - Cells(j, 4)) ^ 2) < RefDistance Then
countmatch = countmatch + 1
Cells(j, 5).Value = ActiveCell.Value
End If
Next j
If countmatch = 0 And IsEmpty(Cells(i + 1, 5)) Then
CID = CID + 1
ActiveCell.Offset(1, 0).Value = CID
End If
Wend
Range("a1").Select
Application.ScreenUpdating = True
End Sub




Tony

jawa
05-05-2005, 08:47 PM
Hi Tony,

I think this works. Thanks heaps. Do you know a way of speeding it up, as some of my datasets contain more 40000 events.

Regards
JA

acw
05-05-2005, 09:47 PM
John

I'll have to think on it. I was trying to get the thing to actually produce an accurate output before I went much further.

It is one of those things that it actually has to look at each and every item. One of the things I looked at was getting all the items that were within the range of a line, then skipping those as they had already been done. Unfortunately, where this came unstuck was that there were others that were in the range of the group, but not the one that selected the group.

You may be able to get some effect by not actually selecting the cells in the spreadsheet. This is always a slow option so working without actually selecting would be faster. Have to think on it (or have a go as an exercise now you have a working algorithm!!!!).

At least it is faster than doing it manually.


Tony