Consulting

Page 1 of 2 1 2 LastLast
Results 1 to 20 of 28

Thread: search a column of keywords in a folder of .docs and match file names and string

  1. #1
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location

    search a column of keywords in a folder of .docs and match file names and string

    I have a bunch of *.doc documents (around 1000 plus) with keyword (4 digits) almost at the end of the document. All these keywords start with "SD/# and four digits and ends with /". I want to find dcouments matching the 4 digits of keyword from A column of excel and fill the column B with file name and column C and D with the folder names (documents can be in 2 subfolders inside the folder(subfolders names are to identify the type of documents) with the matching keyword because there can be many documents with the same keyword. There are almost 300 keywords of (4 digits, eg: 0001, 8684, 0456, 0022, etc.) in column A of the worksheet. Will this be possible with VBscript. Your valuable help appreciated.

    PC information: 64 bit i3 with win 7, office 2007

  2. #2
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    So you want just the 1st DOC file even if there are more than one with a match? I don't know what TYPE means for columns C and D. Maybe if you attached a simple example file it would show what you want more clearly.

  3. #3
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    Quote Originally Posted by Kenneth Hobs View Post
    So you want just the 1st DOC file even if there are more than one with a match? I don't know what TYPE means for columns C and D. Maybe if you attached a simple example file it would show what you want more clearly.
    Thank you for showing my interest and prompt reply. Want all the documents to listed one by one in column B with matching string in C and the foldernames which these documents are located in the next two colums. Extremely sorry for not being clear and misguiding...

  4. #4
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    That does not match with post #1. So, if there is a parent folder like c:\Excel, with subfolders \Sub1 and \Sub2, then for each row, the file might come from one of 3 locations.

    To attach a file(s), click the Go Advanced button in lower right of a reply, and then the Manage Attachments button below the reply box.

    I will make up an example using what I think was meant by post #1.

  5. #5
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    I will post an example which I have manually done. Plese look into it.
    Attached Files Attached Files
    Last edited by macropod; 12-10-2017 at 12:58 PM. Reason: Erroneous cross-post discussion

  6. #6
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    Quote Originally Posted by Kenneth Hobs View Post
    That does not match with post #1. So, if there is a parent folder like c:\Excel, with subfolders \Sub1 and \Sub2, then for each row, the file might come from one of 3 locations.
    exactly. thank you for your patience and interest[/QUOTE]

    Quote Originally Posted by Kenneth Hobs View Post
    To attach a file(s), click the Go Advanced button in lower right of a reply, and then the Manage Attachments button below the reply box.
    I have attached one in above reply.

    Quote Originally Posted by Kenneth Hobs View Post
    I will make up an example using what I think was meant by post #1.
    sorry for misleading, it would be great if you could make up an example matching the sample

  7. #7
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    You said this in the file.
    Please note there are one main folder and total 4 subfolders, VR, VR-EX, NVR, NVR-EX. Files are grouped into these foldes under the main folder to easily identify. Can there be a serial number for the list of be also added in another colum (column structure can be changed)
    I don't know why column C has a 4 digit number. I thought your goal was to MATCH the 4 digit number in column A? If not, what is the purpose of column A.

    Without clear goals, full solutions will be nearly impossible. Think about what could happen. e.g. All files contain SD/#7301/ with 7301 in column A: f1.doc, f2.doc, \VR\f1.doc, \VR\f3.doc, \NVR\f1.doc, \NVR\Ken.doc. In that scenario, f1.doc with be in column B, nothing would be in Column C as it came from the parent folder. What would go into column D is a mystery. I guess if one kept the same filename then VR would go into column D.

    Since the goals are not clear enough to handle all scenarios, the best I can do for you is to show you how to solve some scenarios.

  8. #8
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    hi, the problem is the there are multple files containing the same 4 digit number in it. when file list is popluated I think we would not be able to identify the which file belong to which number. The document file names cannot be saved with the neither the folder details nor the 4 digit number to identify it. so if the number again is written in the column next to it we can identify the file easily with the 4 digit matching number. Hope, I am clear

  9. #9
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    the numbers in column C is actually the search result showning match with the file number. I have done it manually to show you how the end result has to be. I think by doing that I have created confusion.

  10. #10
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    That was what I was saying. So, column A serves no purpose?

    If you want all files with at least one match from a list, I would break that list out separately. e.g. Sheet2 column A or a named range somewhere else so there was no confusion.

    Rather than multiple columns, I would list the found match in Sheet1 with found match in column A, base filename, in column B, and then the subfolder path in Column C if not in the parent folder.

    I would probably make it more simple and include the full filename in a hyperlink with the base filename as the hypertext.

    If you were to match in a file by a list, could a file have more than one match?

  11. #11
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    Column A is the keywords to be searched. The output is in Column B which will be having the file number alone and there are multiple files with the same number. So when the result populates how will we identify which all the files have a particular keyword as the keyword is inside the file not a part of the file number.

  12. #12
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    Quote Originally Posted by Kenneth Hobs View Post
    That was what I was saying. So, column A serves no purpose?

    If you want all files with at least one match from a list, I would break that list out separately. e.g. Sheet2 column A or a named range somewhere else so there was no confusion.



    Rather than multiple columns, I would list the found match in Sheet1 with found match in column A, base filename, in column B, and then the subfolder path in Column C if not in the parent folder.
    if it can happen in the same sheet instead of breaking it, would be great, if no other go, then it is okay with multiple sheets.

    Quote Originally Posted by Kenneth Hobs View Post
    I would probably make it more simple and include the full filename in a hyperlink with the base filename as the hypertext.
    hypertext is okay but not necessary.

    Quote Originally Posted by Kenneth Hobs View Post
    If you were to match in a file by a list, could a file have more than one match?
    no there is only one match inside the file..(there are no other keywords than the ones in Column A)..

  13. #13
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    below is an example of my working sheet with explanations, hope that will help you to make out the senario involved
    Attached Files Attached Files

  14. #14
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    Don't expect this to be fast as you wanted a many to many search. There might be something faster than Instr().

    Sub Main()  
      Dim p$, fn$, i As Long, j As Long, r As Long, c As Integer
      Dim a, b, e, rr As Range, cc As Range
      Dim ws As Worksheet, o As Object, s$
      Dim fso As Object 'New Scripting.FileSystemObject
      
    '******************* INPUTS **********************************
      p = ThisWorkbook.Path & "\" 'Parent folder
      Set ws = Worksheets(1)
    '******************* END INPUTS ******************************
      
      'List of 4 digit numbers. 'e.g. SD/#7301/, SD/#0231/
      Set rr = ws.Range("A2", ws.Cells(Rows.Count, "A").End(xlUp))
      
      On Error GoTo EndSub
      Application.DisplayAlerts = False
      
      a = aFFs(p & "*.doc", , True)
      If Not IsArray(a) Then Exit Sub
      
      Set fso = CreateObject("Scripting.FileSystemObject")
      ReDim b(1 To Rows.Count, 1 To 4)
      
      For Each e In a
        Set o = GetObject(e)
        s = o.Content
        For Each cc In rr
          i = InStr(s, "SD/#" & cc.Text & "/")
          If i > 0 Then
            j = j + 1
            b(j, 1) = fso.GetFile(CStr(e)).Name
            b(j, 2) = cc.Text
            fn = fso.GetParentFolderName(CStr(e))
            If Len(fn) > Len(p) Then b(j, 3) = Right(fn, Len(fn) - Len(p))
            b(j, 4) = WorksheetFunction.Round(Len(s) / 65, 0)
          End If
        Next cc
        o.Close False
      Next e
      
      Set fso = Nothing
      If j = 0 Then Exit Sub
      
      b = Application.Index(b, Evaluate("row(1:" & j & ")"), Application.Transpose([row(1:4)]))
      ws.[B2].Resize(j, 4).Value = b
      ws.UsedRange.Columns.AutoFit
      
    EndSub:
      Set fso = Nothing
      Application.DisplayAlerts = True
    End Sub
    
    
    'MyDir should end in a "\" character unless searching by wildcards, e.g. "x:\test\t*
    'Command line switches for the shell's Dir, http://ss64.com/nt/dir.html
    Function aFFs(myDir As String, Optional extraSwitches = "", _
      Optional tfSubFolders As Boolean = False) As Variant
      
      Dim s As String, a() As String, v As Variant
      Dim b() As Variant, i As Long
      
      If tfSubFolders Then
        s = CreateObject("Wscript.Shell").Exec("cmd /c dir " & _
          """" & myDir & """" & " /b /s " & extraSwitches).StdOut.readall
        Else
        s = CreateObject("Wscript.Shell").Exec("cmd /c dir " & _
          """" & myDir & """" & " /b " & extraSwitches).StdOut.readall
      End If
      
      a() = Split(s, vbCrLf)
      If UBound(a) = -1 Then
        Debug.Print myDir & " not found.", vbCritical, "Macro Ending"
        Exit Function
      End If
      ReDim Preserve a(0 To UBound(a) - 1) As String 'Trim trailing vblfcr
      
      For i = 0 To UBound(a)
        If Not tfSubFolders Then
          s = Left$(myDir, InStrRev(myDir, "\"))
          'add the folder name
          a(i) = s & a(i)
        End If
      Next i
      aFFs = sA1dtovA1d(a)
    End Function
    
    
    Function sA1dtovA1d(strArray() As String) As Variant
      Dim varArray() As Variant, i As Long
      ReDim varArray(LBound(strArray) To UBound(strArray))
      For i = LBound(strArray) To UBound(strArray)
        varArray(i) = CVar(strArray(i))
      Next i
      sA1dtovA1d = varArray()
    End Function

  15. #15
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    I placed the excel sheet with macro in the main folder and tried to run it. When I tries to run it, it hangs the

    pc and excel. It took some time to renistate the pC back into the previous condition. On third attempt I could run

    the macro but it gave a message and stopped after reading through all the documents "the document could not be

    registered. It will not be possible create links from other documents to this doctument". It did not print

    anything on the active sheet and did not create a new sheet, just plain excel sheet with the keywords in column A.

    The word doc has a macro in it which utilizes the activx to take data from webpage and creates the document name. Can you please remove the hyperlinking section from the code.

    The macro is installed in C Drive not in office folder and cannot be removed from the system as it is vital to the document. Can this be the reason? if so can the macro be killed during the running of the excel macro?.

  16. #16
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    The word doc has a macro in it which utilizes the activx to take data from webpage and creates the document name



    Please post a sample Word document.

  17. #17
    VBAX Regular
    Joined
    Dec 2017
    Posts
    21
    Location
    Here is the sample folder with documents
    Attached Files Attached Files

  18. #18
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    I put the files in G:\test_snb\mainfolder, containing subfolders G:\test_snb\mainfolder\NVR, G:\test_snb\mainfolder\NVR-EX,G:\test_snb\mainfolder\VR and G:\test_snb\mainfolder\VR-EX

    Then run this code:

    Sub M_snb()
      sn = ThisWorkbook.Sheets(1).Columns(1).SpecialCells(2)
        
      with CreateObject("wscript.shell")
        For j = 2 To UBound(sn)
          MsgBox .Exec("cmd /c findstr /m/s SD/#" & Format(sn(j, 1), "0000") & "/ G:\test_snb\mainfolder\*.doc").StdOut.ReadAll
        Next
      end with
    End Sub
    Last edited by snb; 12-10-2017 at 09:28 AM.

  19. #19
    VBAX Guru Kenneth Hobs's Avatar
    Joined
    Nov 2005
    Location
    Tecumseh, OK
    Posts
    4,956
    Location
    snb's example was to show how findstr can help. Change MsgBox.Exec to MsgBox .Exec in post #21.

    I can not duplicate your system restore issue. When a macro "hangs" and ESC key presses or Break key does not abort it, the 3 finger solute (Alt+Ctrl+Del, Task Manager, Kill) is the usual method to Kill the Excel instance. It is always best to run a macro like mine by itself with no other Excel files open. That is, at least until you know what to expect.

    I can not duplicate your timing problem for those files. I did say it took a "long time" to run. My run with your files took about 44 seconds. Anything over 5 seconds is a long time to me. I like to see under one second but you wanted to get the character count / 65 for each match.

    snb or I could show you how to modify his code to do what mine does, less the character count / 65.

    I added an ESC key option. When you abort a macro like mine, it can leave an instance of Word that needs to be Killed via 3 finger solute before another run or you could get an OLE error.

    Sub Main()    
        Dim p$, fn$, i As Long, j As Long, r As Long, c As Integer
        Dim a, b, e, rr As Range, cc As Range
        Dim ws As Worksheet, o As Object, s$
        Dim fso As Object 'New Scripting.FileSystemObject
        Dim d#
        
        d = Timer
         
         '******************* INPUTS **********************************
        p = ThisWorkbook.Path & "\" 'Parent folder
        Set ws = Worksheets(1)
         '******************* END INPUTS ******************************
         
         'List of 4 digit numbers. 'e.g. SD/#7301/, SD/#0231/
        Set rr = ws.Range("A2", ws.Cells(Rows.Count, "A").End(xlUp))
         
        On Error GoTo EndSub
        Application.EnableCancelKey = xlErrorHandler
        Application.DisplayAlerts = False
         
        a = aFFs(p & "*.doc", , True)
        If Not IsArray(a) Then Exit Sub
         
        Set fso = CreateObject("Scripting.FileSystemObject")
        ReDim b(1 To Rows.Count, 1 To 4)
         
        For Each e In a
            Set o = GetObject(e)
            s = o.Content
            For Each cc In rr
                i = InStr(s, "SD/#" & cc.Text & "/")
                If i > 0 Then
                    j = j + 1
                    b(j, 1) = fso.GetFile(CStr(e)).Name
                    b(j, 2) = cc.Text
                    fn = fso.GetParentFolderName(CStr(e))
                    If Len(fn) > Len(p) Then b(j, 3) = Right(fn, Len(fn) - Len(p))
                    b(j, 4) = WorksheetFunction.Round(Len(s) / 65, 0)
                End If
            Next cc
            o.Close False
        Next e
         
        Set fso = Nothing
        If j = 0 Then Exit Sub
         
        b = Application.Index(b, Evaluate("row(1:" & j & ")"), Application.Transpose([row(1:4)]))
        ws.[B2].Resize(j, 4).Value = b
        ws.UsedRange.Columns.AutoFit
         
    EndSub:
        Set fso = Nothing
        Application.DisplayAlerts = True
        Application.EnableCancelKey = xlInterrupt
        Debug.Print Timer - d
    End Sub

  20. #20
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    @KH

    Sub M_snb() 
      sn = ThisWorkbook.Sheets(1).Columns(1).SpecialCells(2) 
         
      With CreateObject("wscript.shell") 
        For j = 2 To UBound(sn) 
           c00=c00 & vbcrlf & .Exec("cmd /c findstr /m/s SD/#" & Format(sn(j, 1), "0000") & "/ G:\test_snb\mainfolder\*.doc").StdOut.ReadAll
        Next 
      End With 
      with createobject("scripting.filesystemobject")
         st=split(c00,vbcrlf)
         for j=1 to ubound(st)
            
         next
      end with
    End Sub

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •