PDA

View Full Version : Newbie help: advanced table manipulation



calvink
07-17-2007, 09:40 PM
Hi,

I actually posted this in the Word forum:

vbaexpress.com/forum/showthread.php?p=106983 (http://vbaexpress.com/forum/showthread.php?t=13814)

But the helpful guys there suggested that I try here as well because I can output the results into Excel too. Word or Excel they're both similar in concept, so I'm curiuos to know if anyone here can check out my above posting and let me know what you think...

Thanks in advance :)

rbrhodes
07-18-2007, 01:21 AM
Hi Calvink,

I checked out the Word thread and I think you're better off with Excel. As fumei says Excel is for numbers.

Now as to whether its po$$ible...Please do a scan to Excel and post it.

By which I mean this may be more work than free time allows <g> But we'll certainly try!

Cheers,

dr

calvink
07-18-2007, 02:57 AM
hi rbrhodes,

Thanks for your input. Here is the excel file.

One thing I'm curious, is it possible to produce an application (.exe) to do all this via VBA? This is because if the macro can only be executed after opening the document, I'm afraid that if the document gets too large then it'll crash or takes forever to open...definitely not ideal for automation (we plan to run the application overnight).

rbrhodes
07-18-2007, 04:58 AM
Wow. The excel file looks so much better than Word file. But Wow.

It's late so I'll be looking at it tommorrow...One hell of a problem tho!

And yes there are ways to create an Exe out of VBA. Or it could be done in VB as well.

Cheers,

dr

calvink
07-18-2007, 08:48 PM
thanks rbrodes, looking forward to what you can come up with!

rbrhodes
07-19-2007, 04:46 AM
Hi calvink,

Here's a taste of what's possible.

I didn't get too much time today to look at it (as I'm on vacation from work...so we're really talking "free" time here <g>) but this cleans up the example a little bit and shows what Excel and VBA could do.

I alslo?ked into tranxlati0n from Malay to help b0t the data is so ga%&rbled that reaMly didxn"t help much. <G>

If you have a few more examples of your data into Excel it might provide a little more info about the possibles.

Anyway this is only an bare-bones example, not very efficient or even well thought out. I have a lot more ideas about what to do with it. This is just to give you hope. <ouch!>

When will I learn that humour doesn't travel the net!

Anyways, open the Cleanit file, click the "Do it" button then select the example data file to see it do it's thing.

Cheers,

dr

Edit: Added exit for file open = cancel

calvink
07-19-2007, 09:49 PM
Thanks rbrhodes! Can I ask what have you done with the very 1st page, did you simply ignore it because it's different format from the subsequent 'data' pages? Well I would like to capture certain data on the 1st page itself as well, as it contains important info like the document identifiers (series number, page number, company name etc).

Also due to the ocr/scanning, if you look at Page 3 in your cleaned excel (staring with 32/YIP LEAN FUNG/etc), the columns have been offset ('pushed') to the right by 1 column. Would it be possible, after having run the clean up, to parse the entire file again to look for offset columns, and move it back left by that number of columns so to align it back with the rest of the properly aligned data?

This way, all the data should be properly aligned in the same column as per its heading - throughout all the pages - and I suppose data extraction after this will be much easier/simpler? This so we can do the search on the Amount column and be confident that the values we're searching are really the amount values and not other columns' values due to mis-alignment.

Hope you get what I mean...I realize all the help here are rendered voluntarily and I sincerely thank you and everyone else who have suggested solutions! :)

rbrhodes
07-23-2007, 05:30 AM
Hi.

A little more work on it. The first page I just threw away (It's back...)

The Column alignment works for this example, perhaps others. No guarantees! I would have to see more examples I think.


I put the numbers as non - decimal so they are probably inflated by a factor of 100 i e 3289 is probably 32.89 but it will appear as 3289.00 ! Thats me being tired right now, I've at this for hours.

Cheers,

dr

rbrhodes
07-24-2007, 12:59 AM
New version.

- Added Status bar info

- Handles multiple decimals (.....)

- Handles multiple cell entries ie: 106.00 56.78

- Formatting is better handled

- Columns set to width (not AutoFit)

- Took out smart*ss msgbox at end (nag screen)

- Misc. other stuff...

Cheers,

dr

calvink
07-24-2007, 03:31 AM
Hi rbrhodes,

Sorry for the lack of updates, I've been chasing some deadlines lately.

Anyway wow im truly impressed!! I haven't had much time yet to thoroughly verify the results against the original pdfs, but your cleaining up and formatting are superb!

I will definitely spend more time on this, as this is and will remain a major priority for me for the next few months.

I'm heading off home now, so just a quick question before I come back with more...how flexible is your script to process other similar formats? By similar I mean the overall big table is there, just that some columns might change position (example Amount column), perhaps even new columns as well. Also what about the number of columns and rows in the big table, will that matter?

Remember I mentioned in my first post, this is just one of several formats that need processing...

Anyway thanks alot for your hard work again! :)

rbrhodes
07-24-2007, 06:09 PM
Hi Calvink,

Like I said, the code works for this example. It has some hard coded stuff based on the one example provided. The basic logic is there for cleaning (it kills everything except alphanumeric and ".") and thats a start.

The Column formatting, Headings, etc are hard coded but it's for human consumption only and easy to change or even leave out. Since the final result is simply to extract data the formatting could be done then. I only put in the StatusBar and formatting for the example results to be human readable.

As for multiple formats how many are we talking here? NOTE: This is FAR from being a 'program'. It's just some example code based on one file...

Cheers,

dr