Consulting

Results 1 to 5 of 5

Thread: Macro to take data from Word doc. and transfering it to Excel

  1. #1

    Question Macro to take data from Word doc. and transfering it to Excel

    Hello!
    I was wondering if someone could help me, I have used a macro before but never really made one myself in this type of way... As my subject title states, I am basically trying to get a macro to take all the data from a word doc file and transfer/convert it to an Excel file. I have attached a copy of a test doc that I am going to be using and this is the format of any/all docs that will be used in the near future the only thing that would need to be important I guess is that when its transferring the information on the part where it says Attorney the macro should be able to tell which ''litigants'' it is as and label it Defendant or Plaintiff.. I am not sure if that is even possible? and if so how would that work exactly? also would there need to be a loop for this? I am not sure where to start. can some one please help!! thanks!!

    P.S the example that is attached is only page 1. the original doc is a 2,000+++ doc file
    Attached Files Attached Files

  2. #2
    Whoever created the format has made it as difficult as it could possibly be to extract data from, so you have a mountain to climb, when you haven't yet mastered the skills to climb a few steps.

    The document begins with a section of formatted text between two lines and I can almost guarantee that the last line of that block may not appear in all the documents. That part however is relatively straightforward to handle

    Then you have a selection of nested tables, which are a pain to deal with, but as they are fixed in format the issues they create are not insurmountable

    Then there is the Attorney/Litigants table and even in your example the format is different between plaintiff and defendant with regard to the spacing, and the cells are shared by more than one attorney's details and there are differing numbers of lines in each attorney's data. That demands no end of error handling to ensure that you get the information you want. The adjacent column indicates who those attornets represent so the issue you have identified is the least of your problems.

    Then we get to the 'proceeding text' and perhaps 2000+ pages of who knows what that you may somehow wish to handle, and we have not yet begun to explore how you want this to be formatted in Excel.

    This is a major task that promises hours of work and is therefore one that no-one is going to undertake lightly.
    Graham Mayor - MS MVP (Word) 2002-2019
    Visit my web site for more programming tips and ready made processes
    http://www.gmayor.com

  3. #3
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    I agree with Graham that your document's overall structure is not conducive to data extraction.

    If you're not already committed to the format you've submitted here, I'd strongly recommend reformatting it with a structure that is conducive to data extraction. You might, for example, create a 5-row by 1-column table for the 'header', and replace the nested 2nd table with a simple 8-row by 2-column table. In your 3rd table there should be separate rows for each item you want recorded in the Excel workbook (e.g. a separate row for each of attorney name, each line of the attorney address, phone, fax, email, etc., with every attorney having the same number of rows used the same way regardless of whether they're all filled in and each row being used for the same content (i.e. don't bunch the data up when there are missing lines). The same applies to litigants. If you don't do that, it will be well-nigh impossible to automate the data extraction in a way that ensures consistency of output.

    As for your Proceedings table, I can only guess what the 'Proceedings text' column is supposed to contain; if the cells contain multiple paragraphs, that just adds to the data extraction woes...
    Last edited by macropod; 02-20-2018 at 10:31 PM.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

  4. #4
    Would it be possible only to pull out 2 things from the doc. and just ignore the rest of the info?

    For example if I wanted to pull out the Attorney's name from the Attorney's side and if they are a plaintiff or a defendant from the Litigants side?

  5. #5
    Knowledge Base Approver VBAX Guru macropod's Avatar
    Joined
    Jul 2008
    Posts
    4,435
    Location
    Not unless you modify the layout so the macro will know where to look for those data. In your existing document, there appear to be attorney names in multiple different paragraphs in the first two cells. It would be impossible for a macro to work with a structure in which one never knows which paragraphs will contain the data.
    Cheers
    Paul Edstein
    [Fmr MS MVP - Word]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •