PDA

View Full Version : Need help for a macro to match data in excel!



zyffe
05-20-2013, 05:02 AM
Hi guys,

First of all, I use Excel 2007.

Now, this task I am stuck with might be easy for experienced VBA-programmers, but I as a rookie have been struggling to complete it. I really need this tool for my masters thesis, since I can’t proceed to next step before it Is done. Therefore I ask for your help.

I have attached a simplified excel-workbook with the relevant content (Please see the link in the bottom).

In this excel workbook there are 3 sheets. “PE_firms”, “Industry_firms” and “Matched_list”. “PE_firms” contains a list of 4 private equity firms, and the task I need VBA to perform is to match these with 4 companies from “Industry_firms” (based on the following matching criteria), and copy the full rows of the 8 matched companies and paste them into “Matched_list”:

Matching criteria:
• First, match with the 2-digit industry code (or as close as possible)
• Second, match with year of the deal (or as close as possible)
• Third, find the closest match on pre_performance (measured by ROA) within a reasonable range of company size (measured by total assets)

NB. If any of the requirements, steps or formats are creating problems, please let me know, and I will reply within a few minutes. Maybe we can loosen or alter them a bit.

Link to the uploaded workbook: http://www.speedyshare.com/Te3kR/Matching-Workbook.xlsx

As mentioned, this file I a simplified one. Really I need 500 PE firms matched with industry firms from a list of 4000, so manual matching is not really an option.

In advance, thanks a lot for any help!

enrand22
05-20-2013, 07:32 AM
I will try my best... just let me analyse it.

enrand22
05-20-2013, 08:15 AM
Zyffe, i attach the code

respecting the Matching criteria:
• First, match with the 2-digit industry code (or as close as possible)<- it was the same 2 digit industry code
• Second, match with year of the deal (or as close as possible)<- it was the same year of deal
• Third, find the closest match on pre_performance (measured by ROA) within a reasonable range of company size (measured by total assets) <- i have troubles with this one:

for the closest match of ROA, i write in range(h1) a value. if the roa is 0.10080044128612 it will look this number in the range +.1 and -.1 because h1 value is .1, you can modify this number.

please, reply it this is what you want.

zyffe
05-20-2013, 12:11 PM
First of all, thanks for the help, and sorry about the later reply.

I have the following comments to the macro:
• The macro has only matched 3 out of 4 PE firms. The last one, where the deal year is 2007 has not been matched.
• Also, I tried to delete the list of matching firms, and run the macro again, but it didn’t paste anything.
• If possible I would like the macro to paste only one industry firm for each PE firm in the Matched_list sheet. The best fitting one naturally.
• I’m not sure from reading the macro, but does it copy and paste the whole row of the Industry firms and not only the first few columns in which there is data in this test file? If not, I would like it to do so, as the data I need in my real file extents to about 50 cells in each row for each company.
• Finally I need to be able to identify which Industry firm has been matched with what PE firm.

I really appreciate your help :)

enrand22
05-20-2013, 04:40 PM
okey, i made some changes



I have the following comments to the macro:
• The macro has only matched 3 out of 4 PE firms. The last one, where the deal year is 2007 has not been matched. <- it didnt match because in the same year there were not any industry, fixed this with a selected range of 1 year (so we look 2006,2007,2008), the same was fixed for industry.


• Also, I tried to delete the list of matching firms, and run the macro again, but it didn’t paste anything. <- i dont know what happened, but i have made some tests and fixes, and it is working


• If possible I would like the macro to paste only one industry firm for each PE firm in the Matched_list sheet. The best fitting one naturally. <- i think it will not be useful (and very hard to made) because, the best fitting one for the computer, may be it is not for you or for me, if you order your information with the one with more weight I.E:

if it is more important year than ROA, order the information by ROA and then by year. you will have a scope of the firms ordered by year and it will be easier for you to chose one.


• I’m not sure from reading the macro, but does it copy and paste the whole row of the Industry firms and not only the first few columns in which there is data in this test file? If not, I would like it to do so, as the data I need in my real file extents to about 50 cells in each row for each company. <- yeah, actually it is pretty easy, only copy and paste with some filtering in the middle.

first.- take the information to filter (you can extend to "infinite" variables and columns)
second.- filter (you can extend to "infinite" columns)
third.- paste the ones matching.


• Finally I need to be able to identify which Industry firm has been matched with what PE firm. <- Done!

zyffe
05-21-2013, 02:33 AM
Hi again, and thanks for the changes!

The thing is, that I need to make this matching for 500 PE firms, and probably several times over the next few weeks, so if the code could choose only the best matching industry firm (based on the original matching critiera), cut it out of the list of Industry firms (to avoid it being matched with several PE firms, which is crucial) and paste it in the matched list, it would be great. It is okay, if not all PE firms get matched this way, as long as the maneuver reduce the number of firms I have to match manually significantly.

However you have to alter the criteria to make this work, is okay.
• “year” could for instance be ignored (or at least make it easy to alter the code to ignore it).
• If necessary just match on exactly the same industry.
• Again, the most important thing is that the code, cuts out the best matching industry company, puts it in the matched list and displays which particular PE firm it has been matched with, for as many PE firms as possible. I can match the rest of the PE firms manually.

NB. Also there seems to be a problem with the ROA criteria. It has to be put to exactly “0,05” for the macro to make any matches, and then regardless of how you alter the other criteria, it only matches company #2 (industry code 71) and #3 (industry code 25) and finds nothing for #1 and #4 despite the fact, that there are matching companies in the industry list, given the criteria.

Cheers

zyffe
05-21-2013, 02:49 AM
If this makes it easier, it could go something like “Exact match on industry, ignore “year” and match with the closest ROA within a certain range of Total Assets”.

enrand22
05-21-2013, 08:35 AM
let us see.


• “year” could for instance be ignored (or at least make it easy to alter the code to ignore it). <- ignored totally,

• If necessary just match on exactly the same industry. <- actually worked fine except for industry 15, there are not any in industry firms, in order to catch and industry, the code lights the criteria and take 14,15,16.

1st round = same industry
2nd round = lights for +-1 on the industry

• Again, the most important thing is that the code, cuts out the best matching industry company, puts it in the matched list and displays which particular PE firm it has been matched with, for as many PE firms as possible. I can match the rest of the PE firms manually. <- now i have these results

industry 8 = 1 matched (pretty good eh)
industry 71 = 5 matched, with the proper tweaks, it could narrow the search to may be 2 or 3

industry 12 = 12 matched, not as good as i thought, but if i tight the criteria, it will dissapear the industry 8

industry 15 = 3, may be good, but it was the second in the second round, matching industry 16.

the code already delete the firms that were transfered to Matched List.

may be, for this to work, you can light or tight different criterias for different firms (you can modify this by the control in the PE firms sheet). some requiere more or less effort to be chosen. but matching 21 out of 600 i think, it is pretty good.

zyffe
05-21-2013, 11:55 AM
I dont know if i am doing something wrong, or if something in my excel is not how it is supposed to be, but when i delete the list of firms in "matched_list" and run the macro again with the same settings, as you left there, the macro doesnt match anything. It only writes the ID number of the one company with industry 15 :S

zyffe
05-21-2013, 11:58 AM
I am using excel 2007, if that makes any difference.

enrand22
05-21-2013, 12:00 PM
ooooh, may be if you are using the same database, remember that all the information that already match, it is deleted. meaning that the second time (your first time because i already run it) you will not have the information.

if this is not the problem, i guess we could only debug... enter to vba y check after the filters, what excel is filtering and why...

i will check my excel workbook here to look up at it.

zyffe
05-21-2013, 12:00 PM
I must say i dont quite understand what the code does. If it is complicated to explain, maybe just quickly tell me some do and donts, in order to use the code, and not screw anything up.

enrand22
05-21-2013, 12:07 PM
the code is actually very simple...

for instance, if you want to know what excel is filtering (the matching ones) you must go here:

Sheets("Industry_firms").Select
lastrow2 = Range("a1048576").End(xlUp).Row + 1
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=2, Criteria1:=">=" & data1 - Sheets("PE_firms").Range("L1").Value, Operator:=xlAnd, Criteria2:="<=" & data1 + Sheets("PE_firms").Range("L1").Value
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=4, Criteria1:=">=" & data2 * (1 - Sheets("PE_firms").Range("o1").Value), Operator:=xlAnd, Criteria2:="<=" & data2 * (1 + Sheets("PE_firms").Range("O1").Value)
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=5, Criteria1:=">=" & data3 - Sheets("PE_firms").Range("h1").Value, Operator:=xlAnd, Criteria2:="<=" & data3 + Sheets("PE_firms").Range("h1").Value
lastrow2 = Range("a1048576").End(xlUp).Row



stop the macro there, in the last line. and go back to excel and check what excel filtered.


let me, write some coments in the code, for your full understanding... may be, if at the end we can not debbug complety, at least you could learn how i try and make a relative similiar one.

zyffe
05-21-2013, 12:14 PM
I am using the full list of 692 industry firms, so that should not be the problem.
I delete the list og matched firms, and run the macro with ROA = 0,8 ; Industry = 0 and size = 50%, and the only output the code makes in the matched_list is the id number of PE firm with the industry = 15. I have tried to alter the criteria, but it doesnt help.

zyffe
05-21-2013, 12:19 PM
When i stopped the code to see what it had filtered, all rows with content in the "industry" sheet, were hidden. Dont know if they are supposed to be that?

What happens if you do the same as i describe in post #14 ?

I have to be elsewhere in a few moments. I'll be back here tomorrow morning.

enrand22
05-21-2013, 12:28 PM
if it was hidden, it is because it doesnt match anything. verify the filters manualy...

if you are using a plain new database of the 692 industries it is ok, but if not, i didnt delete rows, only clear them.

the last thing to check is forcing the code find something...write a industry dummy that you know it will find and check...

i use excel 2010 but i write my code to work on 2007....

may be you can send me the excel workbook (of course, delete anything confidential) and i can verify what happens... sometimes a code can crash because of the lenguage of the computer working on. or some other stuff.

Sub zyffe2()


' naming variables
Dim bcell As Range
Dim lastrow As String
Dim lastrow2 As String
Dim lastrow3 As String
Dim lastrow4 As String
Dim data0 As String
Dim data1 As String
Dim data2 As String
Dim data3 As Double


' clean information
Sheets("Industry_firms").AutoFilterMode = False

' detect the database and asign a variable for each filter
Sheets("PE_Firms").Select
lastrow = Range("a1048576").End(xlUp).Row
For Each bcell In Range("a3:a" & lastrow)
data0 = bcell.Value
data1 = bcell.Offset(0, 1).Value
data2 = bcell.Offset(0, 3).Value
data3 = bcell.Offset(0, 4).Value


' then, begin to match acording to the variables and the control panel
Sheets("Industry_firms").Select
lastrow2 = Range("a1048576").End(xlUp).Row + 1
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=2, Criteria1:=">=" & data1 - Sheets("PE_firms").Range("L1").Value, Operator:=xlAnd, Criteria2:="<=" & data1 + Sheets("PE_firms").Range("L1").Value
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=4, Criteria1:=">=" & data2 * (1 - Sheets("PE_firms").Range("o1").Value), Operator:=xlAnd, Criteria2:="<=" & data2 * (1 + Sheets("PE_firms").Range("O1").Value)
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=5, Criteria1:=">=" & data3 - Sheets("PE_firms").Range("h1").Value, Operator:=xlAnd, Criteria2:="<=" & data3 + Sheets("PE_firms").Range("h1").Value


'if no industry matched, the next two lines tell the code to light the filters (only for industry)
lastrow2 = Range("a1048576").End(xlUp).Row
If lastrow2 > 1 Then


'if at least one matched it will copy first the first columns
Range("a3:a" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).Copy
Sheets("Matched_list").Select
lastrow3 = Range("a1048576").End(xlUp).Row
Range("A" & lastrow3 + 1).PasteSpecial (xlPasteValuesAndNumberFormats)

' then it will copy all the next columns (we divide because, that is what you wanted because in Matched list you have the couble number between)
Sheets("Industry_firms").Select
Range("b3:e" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).Copy
Sheets("Matched_list").Select
Range("c" & lastrow3 + 1).PasteSpecial (xlPasteValuesAndNumberFormats)

' here it will copy the firm whose data was used to match in each one
lastrow4 = Range("a1048576").End(xlUp).Row
Range("G" & lastrow3 + 1) = data0
Range("G" & lastrow3 + 1).Copy
Range("G" & lastrow3 + 1 & ":G" & lastrow4).PasteSpecial xlPasteValuesAndNumberFormats


' here the code deletes the industry that matched.
Sheets("Industry_firms").Select
Range("a3:e" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).ClearContents

Sheets("PE_Firms").Select
Else


'second round... here it is the same as the round one except for the line 4 begining from here
Sheets("Industry_firms").Select
lastrow2 = Range("a1048576").End(xlUp).Row + 1
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=2, Criteria1:=">=" & data1 - 1, Operator:=xlAnd, Criteria2:="<=" & data1 + 1
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=4, Criteria1:=">=" & data2 * (1 - Sheets("PE_firms").Range("o1").Value), Operator:=xlAnd, Criteria2:="<=" & data2 * (1 + Sheets("PE_firms").Range("O1").Value)
Range("$A$2:$E$" & lastrow2).AutoFilter Field:=5, Criteria1:=">=" & data3 - Sheets("PE_firms").Range("h1").Value, Operator:=xlAnd, Criteria2:="<=" & data3 + Sheets("PE_firms").Range("h1").Value

lastrow2 = Range("a1048576").End(xlUp).Row
Range("a3:a" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).Copy
Sheets("Matched_list").Select
lastrow3 = Range("a1048576").End(xlUp).Row
Range("A" & lastrow3 + 1).PasteSpecial (xlPasteValuesAndNumberFormats)


Sheets("Industry_firms").Select
Range("b3:e" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).Copy
Sheets("Matched_list").Select
Range("c" & lastrow3 + 1).PasteSpecial (xlPasteValuesAndNumberFormats)

lastrow4 = Range("a1048576").End(xlUp).Row
Range("G" & lastrow3 + 1) = data0
Range("G" & lastrow3 + 1).Copy
Range("G" & lastrow3 + 1 & ":G" & lastrow4).PasteSpecial xlPasteValuesAndNumberFormats

Sheets("Industry_firms").Select
Range("a3:e" & lastrow2 + 1).SpecialCells(xlCellTypeVisible).ClearContents

Sheets("PE_Firms").Select
End If
Next bcell


Sheets("Matched_list").Select

'and that is all i guess...

End Sub





i will try my best to have a solution.

zyffe
05-22-2013, 05:50 AM
if it was hidden, it is because it doesnt match anything. verify the filters manualy...

I tried to stop the macro where you told me to with the default setting of matching roa = 0,8 ; matching industry = 0 and total assets = 50%. Here is what the filters says:
• Industry = 8 (which seems fine)
• Size from 17.609.504 to 52.828.512 (which seems odd as no companies in the sample has this size). Obviously the numbers in the sheet are in thousands, but I guess the filter should have made the range from 17.609 to 52.828 ?
• ROA from -0,69919955871388 to 0,90080044128612 (which seems to work fine too).
And again, all companies are in the industry sheet are hidden.

if you are using a plain new database of the 692 industries it is ok, but if not, i didnt delete rows, only clear them. <-- I tried to put in a new list of industry firms, 492 I think. It didn’t work either.

the last thing to check is forcing the code find something...write a industry dummy that you know it will find and check... <-- Tried that too, but got same result as before

may be you can send me the excel workbook (of course, delete anything confidential) and i can verify what happens... sometimes a code can crash because of the lenguage of the computer working on. or some other stuff. <-- I have attached the complete list of companies, but it will only make things more complicated for you I think: http://www.speedyshare.com/BvTQx/Matching-Workbook-complete.xlsx

Btw, what is the “hoja1” sheet about?

Just to emphasize. The most important thing is that the code, can find the best industiral firm match for as many PE firms as possible and somehow highlight which PE firm and industry firm has been matched, while making sure none of the industrial firms are matched with more than one PE firm.
Unfortunately, I cant use a list of several potential industrial firm matches for much. I might aswell just browse through the complete list and use the filters manually.

Don't get me wrong though, i am very gratefull for your effort!

p45cal
05-23-2013, 03:29 AM
I've been following this thread because it's a bit of a challenge. You want to match pairs of companies as best as possible without having any one company paired up more than once.

Algorithmically, I'd use a brute force method to score match quality (how good a match is) for every pair combination possible, then I'd sort by that score so that at the top would be the best possible matched pair, which could be from anywhere in the two lists.
I'd take that top score and put the pair of companies on your results sheet, at the same time removing all instances of either company from my score list. I'd then have a new top-scoring pair of companies etc.
Looking at your full list briefly, there would be 1.25 million combinations, too many to put in a single column in Excel but it would be possible to store such a long list as an array in memory while it's being processed. If you insisted on a perfect match for industry number then that list could be hugely reduced, but you'd probably end up with unmatched companies.
(I note you've taken out the year data in your complete list!)

Now to the crux of the matter; how to score a pair of companies for match quality.
You say "First, match with the 2-digit industry code (or as close as possible)"
Here. the 2 companies' industry codes could be compared and the closer they are the higher the score, say 100 for an exact match and 0 for industry codes which are as far away as they can be from each other.

A similar thing can be done for year data.

You say "find the closest match on pre_performance (measured by ROA) within a reasonable range of company size (measured by total assets)"
Again a similar thing could be done, but we have to include 'reasonable range of company size' which could be done asking whether the companies in question are within a certain percentile. (We could split this last into 2 separate score components: 1. what percentile the company is in with respect to its size and 2. how close the ROAs are.)

Each of the 3 or 4 scores determined above could be weighted according to the importance you attach to each (by, say, multiplying by a factor), which would then be combined (added?) to give you a single final match-quality score for each company to be used in the 'big list'.

I don't have infinite time to do this, but will have a go at it as and when I get spare time - what's your schedule?

zyffe
05-23-2013, 07:13 AM
p45cal: Thanks for the thorough suggestion! However a friends of mine have just managed to put together something rather simple which works at least for now.
Maybe i'll need a more dynamic macro later. In that case, ill bring the threat back to life.

Erand22: Like i wrote above, i now have a rather static macro which matches strictly on industry and finds the closest ROA within a range of size. This works for now, but i might need one later which can also take "year" into account. In that case I might ask for help again. Thank you very much for your effort so far!

p45cal
05-24-2013, 07:05 AM
p45cal: Thanks for the thorough suggestion! However a friends of mine have just managed to put together something rather simple which works at least for now.
Maybe i'll need a more dynamic macro later. In that case, ill bring the threat back to life.OK. Could we see the code of your solution (having put people to task)?

I've done some coding which takes into account year, ROA and Industry number, but not yet the 'reasonable range of company size' bit.

What matches does your solution give in the first file you attached in msg#1? I'd like to compare.
It also allows weighting and takes 4 seconds to come back with results.

p45cal
06-21-2014, 08:15 AM
OK. Could we see the code of your solution (having put people to task)?Clearly not.