Consulting

Results 1 to 12 of 12

Thread: Converting 1 million Rows of data to new Tables and Columns

  1. #1

    Converting 1 million Rows of data to new Tables and Columns

    Hi, thank you for reading this. I have a database that is not at all structured correctly but is the only way we can receive the data. The table has over a million rows and consists of the following columns:

    dataID - just a unique ID; not used by us
    system_id - the system the question is valid for
    siteName - name of the site
    siteAddress - address of site
    siteType - type of site
    QLevel - level of question
    fieldQuestion - question label
    fieldAnswer - answer
    questionID - question number

    dataID | system_id | siteName | siteAddress | siteType | QLevel | fieldQuestion | fieldAnswer | questionID
    1234 | 12649 | ABC Location | 123 Main St | Building | 1 | This is a question | This is their answer to77 | 77
    1234 | 12649 | ABC Location | 123 Main St | Building | 2 | This is a child question1 | This is their answer to78 | 78
    1234 | 12649 | ABC Location | 123 Main St | Building | 3 | This is a child question2 | This is their answer to79 | 79

    Each question has been grouped together, so for the above example 77,78,79 would go into a specific table but 80-99 may be in another table and so on. I did this to clean-up the tables and also because I didn't want to put hundreds of questions into a single table. So what I would like when all is said and done are tables that look like this:

    system_id | question_77 | question_78 | question_79
    12649 | This is their answer to77 | This is their answer to78 | This is their answer to79

    (for reference; system_id is a number, and all question fields are memo. Added pipes just to show separation of data)

    The other tables would look similar but with their own set of questions.

    What I first did was to create all of the new tables with their corresponding "question_#" fields. Each questionID corresponds to a specific table and not all are sequential.

    Now I need to move the data from the original table to the new recipient tables. Below is the VBA. However, when I run it I tend to get different issues at different times. I have never successfully gotten all the way through to the end. Sometimes I can get 100K+ through and then it errors out other times only a couple thousand and it errors out. The error is typically 'Run-time error '3049': Cannot open database ". It may be a database that your application recognizes, or the file may be corrupt.' (again almost at random). Below is the VBA that I wrote to move everything to its correct table:

    Private Sub Command75_Click()
    Dim dbs As DAO.Database
    Dim myOriginalTable As Recordset
    Dim myRecipientTable As Recordset
    Dim newQuestionNum As String
    '
    Set myOriginalTable = CurrentDb.OpenRecordset("originaltable", dbOpenDynaset)
    '
    Do Until myOriginalTable.EOF = True
      '
      If myOriginalTable![questionID] = 77 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableA", dbOpenDynaset)
      If myOriginalTable![questionID] = 78 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableA", dbOpenDynaset)
      If myOriginalTable![questionID] = 79 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableA", dbOpenDynaset)
      If myOriginalTable![questionID] = 80 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableB", dbOpenDynaset)
      If myOriginalTable![questionID] = 85 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableC", dbOpenDynaset)
      If myOriginalTable![questionID] = 87 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableB", dbOpenDynaset)
      If myOriginalTable![questionID] = 92 Then Set myRecipientTable = CurrentDb.OpenRecordset("tableD", dbOpenDynaset)
      '
      ' The above continues for every single question # so that each one will be placed
      ' in the correct table; there are a couple hundred and not all sequential
      '
      ' Convert number to a field name
      '
      newQuestionNum = "question_" & myOriginalTable![questionID]
      '
      ' Determine whether this systemID exists in new table
      '
      myRecipientTable.FindFirst ("[system_id] = " & myOriginalTable![system_id])
      '
      If myRecipientTable.NoMatch Then
        '
        ' systemID does not exist in new table, create it
        '
        If Len(myOriginalTable![fieldAnswer]) > 1 Then
          myRecipientTable.AddNew
          myRecipientTable![system_id] = myOriginalTable![system_id]
          myRecipientTable(newQuestionNum).Value = myOriginalTable![fieldAnswer]
          myRecipientTable.Update
        End If
      Else
        '
        ' systemID exists in new table, update it
        '
        If Len(myOriginalTable![fieldAnswer]) > 1 Then
          myRecipientTable.Edit
          myRecipientTable(newQuestionNum) = myOriginalTable![fieldAnswer]
          myRecipientTable.Update
        End If
      End If
      '
      myOriginalTable.MoveNext
    End If
     
    Loop
     
    MsgBox "Complete!"
    GoTo Cleanup
     
    Cleanup:
      myOriginalTable.Close
      myRecipientTable.Close
      Set myRecipientTable = Nothing
      Set dbs = Nothing
       
    End Sub
    I added a status bar update but removed it in the above code. It is useful because it tells me whether the process is still running (but does sometimes freeze although it appears to still be running UGH).

    The above basically checks to see if the systemID exists; if it does it overwrites the answer for that question with the new data. If it does not it simply adds the new systemID with the corresponding answer in the appropriate question column.

    This is also a slow process. I'm hoping someone could tell me whether I'm doing this right or perhaps there is a simpler and quicker way to do this. Since the data is all over the place I figured the above would be the best way to do it. I'm also getting this large file on a weekly basis so I want to have a consistent method to process it and hopefully a reliable and fast way to process it.

    PLEASE LET ME KNOW YOUR THOUGHTS!!! I really like to learn from examples and other people's experiences. If I'm doing it wrong or maybe it just needs a tweek, I appreciate any and all feedback.

    Jon
    Last edited by collector00; 02-07-2015 at 07:10 AM.

  2. #2
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    IMHO, the original table design looks better.

    Data should be stored in more rows, less columns, not vice versa.

  3. #3
    Quote Originally Posted by jonh View Post
    IMHO, the original table design looks better. Data should be stored in more rows, less columns, not vice versa.
    I don't necessarily disagree. I guess my issue is that when we go to create reports we want them in columns not individual rows. A single person may be responsible for hundred of IDs. Keeping them in rows there could be hundred of rows just for a single ID. Having them in columns allows a person to scan hundreds of IDs at a time rather than hundreds of questions for the same ID. Hopefully that makes sense. Perhaps there is a way to keep the data as is and just create a report/spreadsheet that is user friendly.

  4. #4
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    Do what you need to do to get the job done. But (good) database design is usually based on normalising the data. The only thing i see wrong with your example data is that the data in the first five fields is duplicated.

    In your design, if you decide certain questions are no longer required, are you going to delete those fields and lose the existing data? Or are you going to leave them in and have surplus fields taking up valuable space in your file and possibly slowing down queries / reports?

    If somebody wants to see answers to questions spanning multiple tables what do you do?

    Neither of those are issues in the first design.

    The first priority of table design is ease of data retrieval, and the changes you propose don't appear to accomplish that.

    Not to mention, the last thing you want to add to an Access database is a memo field. They are notorious for corrupting databases and the more you have the more likely you'll encounter problems.

    Best option is to post up an example of the kind of report you want and we'll help you create it.

  5. #5
    Quote Originally Posted by jonh View Post
    Do what you need to do to get the job done. But (good) database design is usually based on normalising the data. The only thing i see wrong with your example data is that the data in the first five fields is duplicated.
    Understood. The goal is really for MS Access to be a repository for the data so that we can create a report from that data. As I said before, it is must easier for someone to look at a spreadsheet with several columns than it would to look vertical. In other words, if you were only interested in reviewing question_77, you would simply span that column. In your proposed method it would need to be sorted, etc. The end users will not be using MS Access to view the data.

    Quote Originally Posted by jonh View Post
    If somebody wants to see answers to questions spanning multiple tables what do you do?
    That is simple. A query from multiple tables all linked back by the system_id.

    SELECT [Site List].[system_id], tableA.question_77, tableC.question_223 (and so on)

    We know which questions we would want and we know which table they exist in.

    Quote Originally Posted by jonh View Post
    Neither of those are issues in the first design.
    Perhaps. The first design does not place them into columns as we want them. It just comes down to that and placing them into the new tables does accomplish this (but perhaps you have a solution or recommendation that does not consist of remarking about the poor design; I get that).

    Quote Originally Posted by jonh View Post
    Not to mention, the last thing you want to add to an Access database is a memo field. They are notorious for corrupting databases and the more you have the more likely you'll encounter problems.
    May be true but one must try and fail to learn.

    Quote Originally Posted by jonh View Post
    Best option is to post up an example of the kind of report you want and we'll help you create it.
    I thought I had already but essentially we're looking for the report to be column-based to be used in Excel.

    system_id | question_77 | question_78 | question_79
    12649 | This is their answer to77 | This is their answer to78 | This is their answer to79

  6. #6
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    This will try to use any table where the name begins "table".
    So if you have other tables called "tablewhatever" you'll need to put this in a new db and link in the ones you want to process.

    Private Sub Command0_Click()
        Dim db As DAO.Database
        Dim rs As DAO.Recordset
        Dim t As DAO.TableDef
        Dim f As DAO.Field
        Dim qid As Integer
        
        Set db = CurrentDb
        
        Set rs = db.OpenRecordset("select name from msysobjects where name like 'table*' and type=1")
        
        Do Until rs.EOF
            db.Execute "delete * from " & rs(0)
            db.Execute "INSERT INTO " & rs(0) & " ( system_id ) " & _
                "SELECT distinct originaltable.system_id FROM originaltable"
            Set t = db.TableDefs(rs(0))
            For Each f In t.Fields
                If f.Name <> "system_id" Then
                    qid = CInt(Mid(f.Name, InStr(1, f.Name, "_") + 1))
                    db.Execute "UPDATE " & rs(0) & " INNER JOIN originaltable " & _
                        " ON originaltable.system_id = " & rs(0) & ".system_id SET " & _
                        rs(0) & ".question_" & qid & " = [fieldanswer] WHERE originaltable.questionID=" & qid
                End If
            Next
            rs.MoveNext
        Loop
        rs.Close
        Set rs = Nothing
        Set db = Nothing
    End Sub
    Last edited by jonh; 02-12-2015 at 04:27 AM.

  7. #7
    First of all, thank you again for your assistance.

    I ran it a few times now and thought it was running without issue but I'm getting the Type Mismatch error on this line:

    qid = CInt(Mid(f.Name, Instr(1, f.Name, " ") +1))
    I'm not familiar with this function so I apologize in advance. The first time that it did run with success it created all of the system_IDs within the tables but no answers were added. So I have a few tables with system_IDs that are essentially blank. I'm going to keep digging into it but if you have any suggestions please let me know. Thanks again
    Last edited by collector00; 02-12-2015 at 06:57 AM.

  8. #8
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    That bit is trying to get the question number from the field name.
    Do you have other fields in your tables besides system_id and the question fields (question_#) ?

    edit+
    try changing
    If f.Name <> "system_id" Then
    to
    If Left(f.Name, 9) = "question_" Then

    edit2+
    the code you've posted has a space
    qid = CInt(Mid(f.Name, Instr(1, f.Name, " ") +1))

    should be an underscore ??
    qid = CInt(Mid(f.Name, InStr(1, f.Name, "_") + 1))
    Last edited by jonh; 02-12-2015 at 07:20 AM.

  9. #9
    Quote Originally Posted by jonh View Post
    That bit is trying to get the question number from the field name.
    Do you have other fields in your tables besides system_id and the question fields (question_#) ?

    edit+
    try changing
    If f.Name <> "system_id" Then
    to
    If Left(f.Name, 9) = "question_" Then

    edit2+
    the code you've posted has a space
    qid = CInt(Mid(f.Name, Instr(1, f.Name, " ") +1))

    should be an underscore ??
    qid = CInt(Mid(f.Name, InStr(1, f.Name, "_") + 1))
    The question number is actually obtained in its own field, labeled questionID. So I think that is what I need to use rather than trying to extract the number from the question. The questions are just that.

    I left the qid line the same but now that you explain that it is extracting the number may be an issue.

  10. #10
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    It gets the number from the field NAME. You created the tables and set up the columns already yes?

    so it gets say, 77 from the field name question_77 and builds a query around that.

  11. #11
    Gotcha, my misunderstanding. From the new tables yes, it would be question_77 for example. You are correct. I was able to successfully re-run the script and it seems to be fine now. I may have missed something on the initial programming. Thanks again for your patience and assistance.

  12. #12
    VBAX Expert
    Joined
    Oct 2012
    Posts
    726
    Location
    Glad to help

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •