PDA

View Full Version : [SOLVED:] Merging multiple worksheets into one



Rogue
01-12-2016, 02:19 PM
I'm looking to get the following data into an Excel format: data.charitycommission.gov.uk/
The files are .bcp and there's a table build script. I was planning to build the SQL database and filter the data (to only show 110 category charities). Then, I was planning on exporting the filtered results into an Excel document so I can use the data. But, I got stuck since I haven't used SQL before so I didn't really understand how everything worked.

So, I used a python script to convert the data into a CSV format (which might have worked). Those files as well as a Access database template I found are here: depositfiles.com/files/4sxigix1l
And I got the python script and access file from here: github.com/ncvo/charity-commission-extract
(https://github.com/ncvo/charity-commission-extract)
The extract_class field/file has the 'regno number' (the primary key) as well as a 'class ref'. I'm only interested in 'regno numbers' with a 'class ref' of 110. I was hoping to get a list of the 'regno numbers' with a 'class ref' of 110. Then, use that list to get information from the other worksheets for those 'regno numbers'. In the end, I'm looking to make one spreadsheet with information from all the other worksheets but only for 'regno numbers' with a 110 'class ref'. I tried to do that in Excel but apparently the CSV files are too big.

I'm not sure if I could use the Access template to make a database and import the CSV files somehow. Then, sort the data there and export to Excel. I've never used Access before so I'm not sure if that's even possible. Of course, this would probably be easier if I knew how to import to MySQL (since the data is from a database) rather than converting to CSV. Anyway, I'm not sure what the best way forward is. Whether to solve the Excel size problem, use Access or SQL. I would really appreciate some help. Thanks again!

SamT
01-12-2016, 04:38 PM
Forget what you did and where you go the code to do it.

What do you have now and what do you want to have.

I get that you have a mess of CSV files, but that's all I got.

Rogue
01-13-2016, 05:34 AM
Forget what you did and where you go the code to do it.

What do you have now and what do you want to have.

I get that you have a mess of CSV files, but that's all I got.

Sorry for the confusion. Ok, at the moment, I've got a bunch of CSV files. They're mostly in the following format:



Regno Number

Field 1

Etc.



200001

--

--



200002

--

--




I'm looking to merge them into one big table (using the regno number as the constant/primary key which links all the data together):



Regno Number

Field 1

Field 2

Field 3 (From Spreadsheet 2)

Etc.



200001

--

--

--

--



200002

--

--

--

--




Then, I want to filter the data. The extract_class file has the list of regno numbers alongside something called class ref. I want to only keep have data for regno numbers which have a class ref of 110:



Regno Number

Class Ref

Field 2

Field 3 (From Spreadsheet 2)

Etc.



200001

110

--

--

--



200005

110

--

--

--




As you can see, regno number 200002 has been deleted since it doesn't have a class ref of 110.


This doesn't seem that difficult to achieve. But, there are a few problems. Firstly, some of the CSV files have multiple entries under each regno number:



Regno Number

Field 1



200001

X



200001

Y



200001

Z



200002

--




So the final table will actually look like this:



Regno Number

Class Ref

Field 2

Field 3 (From Spreadsheet 2)

Etc.



200001

110

X

Y

--



[Merged cell with above]

[Merged cell with above]

[Merged cell with above]

Z




[Merged cell with above]

[Merged cell with above]

[Merged cell with above]

P




200005

110

N

Q

--



[Merged cell with above]

[Merged cell with above]

[Merged cell with above]

T




200006

110

E

R

--




As you can see with the 200001 entry, there are some merged cells. Field 3 has three rows of data but all that is under the same cell in the Regno Number column.

Also, another problem is that Excel doesn't load each CSV file fully (since they're too big) so I'm not sure how I can use VBA to do all this. That's why I thought I might be able to use Access to do it and export the result to Excel. What do you think?

PS: I'm not sure if that clears it up. If not, I'm more than happy to create a example using the actual data. Thanks again!

Rogue
01-13-2016, 10:56 AM
I've been able to filter the data so Excel opens each file fully (they aren't too big anymore). So, all I have to do is merge the data. The only problem is that some of the tables have multiple entries under one primary key. Will Lookup work? Or do I need to use VBA for this?

SamT
01-13-2016, 02:59 PM
So you have a bunch of CSV Files, The Primary key is the first Field. "Regno Number"
You have a Class Ref (Somewhere)
You have a Workbook somewhere that has a Sheet2 with some data.
You have a file of some sort called "extract_class."

Have I got that right so far?


Where does Class Ref come from?
How does does Field 3 (from Sheet2) get in there.
When some records are "Merged with field Above" What operations are done? Are some added, subtracted, Min'ed, Max'ed, what?
How does the "extract class" file fit in?


It is pretty easy to read all CSV files in a folder and Filter them on one or more Keys, but it's those unknowns that count.
The :devil2: is in the details

Don't worry, we go thru this discovery process a lot. We don't have expertise in your Domain and you don't speak our lingo.

Rogue
01-13-2016, 03:26 PM
Where does Class Ref come from?
How does does Field 3 (from Sheet) get in there.
When some records are "Merged with field Above" What operations are done? Are some added, subtracted, Min'ed, Max'ed, what?
It is pretty easy to read all CSV files in a folder and Filter them on one or more Keys, but it's those unknowns that count.
The :devil2: is in the details

I think I've done a terrible job explaining. Should we start again?

I think it would be easier to show you what I have so far: my.pcloud.com/publink/show?code=XZVlHLZVMUogqSrCqBubtcGwaCS4f7PBBNk

I've got tables in separate sheets at the moment. I want to combine everything into one big table. The thing that is constant is the regno field. Each table has it and it could be used as the primary key to match data together. What would be the best way to combine everything into one table?

SamT
01-13-2016, 05:28 PM
47 MEGA Bytes. Woweee!

@ All: I have attached a file with the Table headers from that file and a sampling of the Extract_Class Table, but I don't think it will be of much real use to us. Size = 12,940 bytes

@ Rogue,

It is obvious that you have put a lot of work into that 47MB file. It will take a lot of code to accomplish the final result you want from those tables. BTW, you did not provide an example of what the combined table that you want should look like.

Please don't ask us to do more free work than we have to.
Please let us decide the easiest way to accomplish what you need.

All we really need right now are the Field Names, (first line,) from the CSV file and the Column Headers for the final product, their correspondences, and the Criteria, (Class #,) used to select, (filter,) records from the CSV. That can be shown in 2 or three columns one on sheet

And the answers to the important questions in my post #5 above.

Note: If the .bcp files are text files, We might be able to work directly from them. Can you provide a small sample of one?

I have also attached what might be the CSV Field names.

Rogue
01-14-2016, 03:12 AM
-Snip-

I wish I could start another thread. I've explained everything extremely badly. From the OP to now, I've been able to do the majority of the work myself. So, I've already sorted the bcp files, the csv files and the filters for class ref 110. I've actually done mostly everything mentioned in the OP.

I've imported the final data (that's already been cleaned up and filtered) into the 47mb spreadsheet I just sent. The only thing I need to do now is merge the sheets (in the 47mb file) into one table.


So, let's forgot everything I've said so far.


Here's what I have at the moment: https://my.pcloud.com/publink/show?code=XZVlHLZVMUogqSrCqBubtcGwaCS4f7PBBNk
(That's the 47mb file I sent earlier)

As you can see, the data is spread across several sheets. I want everything to be in one sheet and in one table. That's absolutely all we need to do.

Here's what the final table will look like: 15177
(I've also added two examples in the table as well as noting which sheet each column comes from)

Does that clear everything up?

SamT
01-14-2016, 11:44 AM
Thank you, From that I was able to get very close to designing a short, simple procedure to accomplish your task.

There is only one task left for you. In the attached I created a table with Source Sheets names and source Column Headers. It only needs spell checking and the column numbers of the source columns.

Oh yeah. Would you please go thru that 47meg file and delete all but 3 rows of data on each sheet, so we have something to test our code with. That should leave a file of about 60kb that you can upload here.

Don't worry, we go thru this discovery process a lot. We don't have expertise in your Domain and you don't know VBA Coding.

@ All,

The Code Algorithm I'm thinking of is:

For i = LBound(TableArray) to UBound(TA)
With Sheets(TA(i))
For j = LB(TA(i)) to UB(TA(i))
.Columns(TA(i, j).Copy DestSht.Columns(c)
c = c + 1
Next
End With
Next

Paul_Hossler
01-14-2016, 12:09 PM
@SamT -- NOTHING is ever that easy


1. There are different number of rows on the source worksheets

2. Data in not 1-1, 1-N, maybe not even N-N between the sheets. For example,

regno=200044 is repeated 13 times in 'extract-financial' (apparently for 13 different dates) and 3 times in 'extract-class' (apparently for 3 different classes)

regno=200053 is repeated 13 times in 'extract-account-submit', each with a different 'arno'

regno=200053 has 13 different trustees in 'extract-trustee'

regno=200064 has 4 'sub-names' in 'extract-name' in the 'subno' field

3. Only regno's with class=110 from 'extract-class' are required (OP #1)

4. So I think that just 're-arranging' the columns from multiple sheets onto a single sheet by copying is not going to line up the rows even after sorting by regno






BUT ... then again I could be wrong

SamT
01-14-2016, 04:59 PM
@SamT -- NOTHING is ever that easyWell, #*(%^)#&%#@)##%^*(#*!!! I forgot the structure of that 47Mb file



From the OP to now, I've been able to do the majority of the work myself. So, I've already sorted the bcp files, the csv files and the filters for class ref 110. I've actually done mostly everything mentioned in the OP.

I've imported the final data (that's already been cleaned up and filtered) into the 47mb spreadsheet I just sent. The only thing I need to do now is merge the sheets (in the 47mb file) into one table.
First I did not download the cloud file in your last post. Not going to if it's another 47M.



The extract_class field/file has the 'regno number' (the primary key) as well as a 'class ref'. I'm only interested in 'regno numbers' with a 'class ref' of 110.
I am a bit confused. You want all Regnos that have at least one instance with a Class 110

The Collected Data Book (47Mb) is only those Regnos?



You want One Table with all Data from all Sheets? Can't be done with the data structures you have provided.

If you look at my Table Headers attachment, you will see that there is no common Secondary Key that would enable consolidating records. It may be different in either the csv files or the bcp files. I dunno, you won't discuss them with us.

So. I've been looking at everything to see what we can do. I know that (with several consolidation) we can provide a table that has most things on it, But... Listing Trustees and Volunteers (By Regno & Charity name) requires another Table. The financials - Part B,also require another table .Charity Objectives would be on final Table.

These are the tables I think ATT we can iteratively consolidate into one. There is a linkage of Columns that can be used as Secondary Keys:


extract_name as table_name


extract_acct_submit


extract_ar_submit


extract_charity_aoo


extract_registration


extract_financial


extract_main_charity





Please don't ask us to do more free work than we have to.
Please let us decide the easiest way to accomplish what you need.


To paraphrase a great programmer (IIRC, Spolski) Sometimes it is better to just start over. At least you learned something the fist time.

Rogue
01-16-2016, 12:50 PM
@SamT -- NOTHING is ever that easy


1. There are different number of rows on the source worksheets
2. Data in not 1-1, 1-N, maybe not even N-N between the sheets. For example,

regno=200044 is repeated 13 times in 'extract-financial' (apparently for 13 different dates) and 3 times in 'extract-class' (apparently for 3 different classes)
regno=200053 is repeated 13 times in 'extract-account-submit', each with a different 'arno'
regno=200053 has 13 different trustees in 'extract-trustee'
regno=200064 has 4 'sub-names' in 'extract-name' in the 'subno' field

3. Only regno's with class=110 from 'extract-class' are required (OP #1)
4. So I think that just 're-arranging' the columns from multiple sheets onto a single sheet by copying is not going to line up the rows even after sorting by regno

BUT ... then again I could be wrong

All those points are correct apart from #3. The Collected Data book (47mb) only has data for regno numbers with class=110. I filtered the original data between the OP and now. So, all the regnos from the Collected Data book are needed.


Well, #*(%^)#&%#@)##%^*(#*!!! I forgot the structure of that 47Mb file
First I did not download the cloud file in your last post. Not going to if it's another 47M.

I am a bit confused. You want all Regnos that have at least one instance with a Class 110

The Collected Data Book (47Mb) is only those Regnos?

You want One Table with all Data from all Sheets? Can't be done with the data structures you have provided.

If you look at my Table Headers attachment, you will see that there is no common Secondary Key that would enable consolidating records. It may be different in either the csv files or the bcp files. I dunno, you won't discuss them with us.

So. I've been looking at everything to see what we can do. I know that (with several consolidation) we can provide a table that has most things on it, But... Listing Trustees and Volunteers (By Regno & Charity name) requires another Table. The financials - Part B,also require another table .Charity Objectives would be on final Table.

These are the tables I think ATT we can iteratively consolidate into one. There is a linkage of Columns that can be used as Secondary Keys:


extract_name as table_name


extract_acct_submit


extract_ar_submit


extract_charity_aoo


extract_registration


extract_financial


extract_main_charity




You're right. The Collected Data Book is the regnos with class 110. So, all the data in the 47mb file is needed.

The cloud file in my last post is exactly the same as the Collected Data book. I just re-linked it to make sure we were on the same page.

Regarding the bcp and csv files. The original data was in bcp format (you can download from the first link in the OP). I converted that into CSV and imported it into a database. Then, I cleaned up the data and, after that, filtered it to only show data for class=110. I exported this into an Excel format. There were multiple files. Then, I merged these files into the Collected Data file (47mb) which I sent to you.

Of course, one table would be better. But, if that's not possible, then I suppose we'll have to make multiple tables.

This might not work. But, we've got a list of regno numbers in the first sheet. Can't we use a search function to find each regno number in the other sheets. When it finds a regno number, it would copy the contents of the X nearby columns and paste that data into the first sheet. Would that work? Obviously, we'll have to code for blank cells and when there's multiple instances of a regno number.


PS: I'm working on what you requested in the last post. I'll send it over soon. Sorry about the delay.

Edit: Could this help: http://sitestory.dk/excel_vba/merge-data.htm The 'Combine 2 Tables Part'? I also found this: https://superuser.com/questions/196565/how-to-merge-data-from-two-differently-structured-excel-files

SamT
01-16-2016, 03:59 PM
Edit: Could this help: http://sitestory.dk/excel_vba/merge-data.htm The 'Combine 2 Tables Part'? I also found this: https://superuser.com/questions/1965...ed-excel-files (https://superuser.com/questions/196565/how-to-merge-data-from-two-differently-structured-excel-files)Those are common problems. The issue with your request is that only those tables listed in my previous post have two common columns. Having at least two common columns is a requirement for combining tables.


This might not work. But, we've got a list of regno numbers in the first sheet. Can't we use a search function to find each regno number in the other sheets. When it finds a regno number, it would copy the contents of the X nearby columns and paste that data into the first sheet. Would that work? Obviously, we'll have to code for blank cells and when there's multiple instances of a regno number.

That would be the same as manually pasting each table, one below the other, (In proper column order,) then sorting the new table on Regno. You wind windup with a series of Stair stepped rows, with each table's entries below and to the right of the previous table, for each regno.

We know that you have the raw data converted to CSV files, We have see the impressive amount of wrok you have already done with that data, We are very sorry that that work was for naught.

I do not believe that one big table is your ulitmate goal. You have never told us what you ultimately want to do with the data.

Please tell us the ultimate goal, let us see the CSV files, and let us recommend a way to reach your ultimate goal.

NOTE: If you let us see the CSV Files, only share the top 5 (Five) lines in each. That is all we need to see. Some of us are already limited to the first 65K lines.

Rogue
01-17-2016, 03:11 PM
Those are common problems. The issue with your request is that only those tables listed in my previous post have two common columns. Having at least two common columns is a requirement for combining tables.



That would be the same as manually pasting each table, one below the other, (In proper column order,) then sorting the new table on Regno. You wind windup with a series of Stair stepped rows, with each table's entries below and to the right of the previous table, for each regno.

We know that you have the raw data converted to CSV files, We have see the impressive amount of wrok you have already done with that data, We are very sorry that that work was for naught.

I do not believe that one big table is your ulitmate goal. You have never told us what you ultimately want to do with the data.

Please tell us the ultimate goal, let us see the CSV files, and let us recommend a way to reach your ultimate goal.

NOTE: If you let us see the CSV Files, only share the top 5 (Five) lines in each. That is all we need to see. Some of us are already limited to the first 65K lines.

My ultimate goal to analyse the data as well as open source the information. I want it collected into one table so the information can be easily compared and related to each charity/regno number.

Here are the filtered files (data only for class ref=110). You can just rename the extensions to .csv: https://my.pcloud.com/publink/show?code=XZC3YLZo0wK9PXBX6XTeMjsyxBeRyDpG6CV

Do you also need the original data (for all class refs)?

SamT
01-17-2016, 05:09 PM
A 18Mb zip file and I have to rename all the files? Open each one and truncate it to five lines, then import each one into Excel, then copy the sheet from each book into one book?

When all I need are the top five lines from each? No thanks.

Please don't ask us to do more free work than we have to.

Rogue
01-20-2016, 02:04 PM
A 18Mb zip file and I have to rename all the files? Open each one and truncate it to five lines, then import each one into Excel, then copy the sheet from each book into one book?

When all I need are the top five lines from each? No thanks.

The 'Collected Data' file is the .CSV files imported into Excel. But, I'm working on cutting the data down.

So, to clarify do you need it in .CSV format or in Excel format? Secondly, should all the data be for the same regno numbers or can it be for any regno numbers? (The first 5 lines aren't always for the same regno numbers. Is that fine?) And finally, if you need CSV files, do you need the original data or the filtered data? The latter is the original but I deleted everything that isn't class=110



Also, I know that you said working with the Excel file wouldn't be possible? But, I just saw this: http://thinketg.com/how-to-return-multiple-match-values-in-excel-using-index-match-or-vlookup/ It uses an Array formula. Could we use something similar to search for regno numbers and copy the nearby columns into a separate spreadsheet.


PS: Sorry for the delay in responding. I've been incredibly busy for the last few days. But, hopefully, now I'll have more time to spend on this. And thanks again for bearing with me! And for the help so far!

SamT
01-20-2016, 06:28 PM
The 'Collected Data' file is the .CSV files imported into Excel. But, I'm working on cutting the data down.I thought that was not all the data files.


So, to clarify do you need it in .CSV format or in Excel format?Either is fine as long as it's only the first <= 5 lines from each file. All combined in one Workbook, one sheet per file would be easiest for us, but a zipped folder of truncated CSV's would work.


The first 5 lines aren't always for the same regno numbers. Is that fine?) That is preferred.


do you need the original data or the filtered data Not filtered.

What we must have is all the headers (first line) from all the raw files. Any data will just be used for familiarization and testing.

The most efficient way for use might be working directly from the CSV's. I can't tell without knowing what they look like in raw form. Which reminds me. If you attach a workbook instead of a zipped folder, what character is used as a separator in the CSV's?


I've been incredibly busy for the last few days
And I won't be here tomorrow :) But don't worry. We have some real experts watching this thread.

Rogue
01-23-2016, 12:28 PM
Just wanted to give an update on this. I was able to solve the problem and merge everything into one table.

I used this add-in: https://www.ablebits.com/excel-lookup-tables/howto-merge-data.php

It has a free trial and was able to use the 'Collected Data' file and merge everything as per the 'Final Table' layout. But, thank you anyway for the help!

SamT
01-23-2016, 03:57 PM
:thumb