Consulting

Results 1 to 14 of 14

Thread: Sleeper: Extraction of key entities using VBA

  1. #1

    Sleeper: Extraction of key entities using VBA

    My csv file has next structure
    16-09-2018 15-52-11.jpgMy csv file has next structure
    16-09-2018 15-52-11.jpg
    The body of text is names of products from check. (GOODS_NAME)
    I want to group any similar names.

    some phrases have the same and either root-key, key patterns
    For example
    word MAKFA Makar (it is Ukraine words)
    I found 7 rows with

    "root or key word MAKFA Makar"
    Pasta Makfa snail flow-pack 450 g.
    MAKFA Macaroni feathers like. in/ with
    2013077 MAKFA Makar.RAKERS 450g
    2013077 MAKFA Makar.RAKERS 450g
    6788 MAKFA Makar.perya 450g
    2049750 MAKFA Makar.SHIGHTS 450g
    2049750 MAKFA Makar.SHIGHTS 450g
    
    and so on. There are many phares. So we extract similar root-key pattern
                                                    Initially                 similar pattern
    1                       Pasta Makfa snail flow-pack 450 g.          MAKFA Makar.
    2                  MAKFA Macaroni feathers like. in/ with          MAKFA Makar.
    32013077 MAKFA Makar.RAKERS 450g          MAKFA Makar.
    42013077 MAKFA Makar.RAKERS 450g          MAKFA Makar.
    56788 MAKFA Makar.perya 450g          MAKFA Makar.
    62049750 MAKFA Makar.SHIGHTS 450g          MAKFA Makar.
    72049750 MAKFA Makar.SHIGHTS 450g          MAKFA Makar.
    8*3398012 DD Kolb.SERV.OKHOTN in/ to v / y0.35                  kolb
    9*3014084 D.Dym.Spikachki DEREVEN.MINI 1kg             Spikachki
    10809 Bananas 1kg              Bananas 
    11                                              Lemons 55+                Lemons
    12                           Napkins paper color 100pcs PL        Napkins paper 
    13                         SOFT Cotton sticks 100 PE (BELL         Cotton sticks
    14                     SHEBEKINSKIE Macaroni Butterfly №40 SHEBEKINSKIE Macaroni
    15*3426789 WH.The corn rav guava / yagn.d / Cat SEED 85g              CAT seed
    16                        FetaXa Cheese product 60% 400g (               Cheese 
    173491144 LIP.NAP.ICE TEA green yellow 0.5 liter                  TEA 
    182030918 MARIA TRADITIONAL Biscuit 180g              Biscuit 
    19197 Onion 1 kg                 Onion
    20                          TOBUSsteering-wheel 0.5kg flow        steering-wheel
    21                     Package "Magnet" white (Plastiktre) Package  (Plastiktre)
    22*2108609 SLOB.Mayon.OLIVK.67% 400ml                 Mayon
    23                            TENDER AGE Cottage cheese 10        Cottage cheese
    
    


    here 7 patterns is root-key( MAKFA Makar.) another products in this list are unique, but it can be that in whole list
    3491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
    3491144 LIP.NAP.ICE TEA BLACK yellow 0.5 liter TEA

    same root-key pattern is TEA.

    How to get all unique root-key patterns. Here 17 uninque patterns.

    MAKFA Makar., kolb , Spikachki, Bananas, Lemons, Napkins paper, Cotton sticks, HEBEKINSKIE Macaroni, CAT seed, Cheese ,TEA, Biscuit, Onion, steering-wheel, Package (Plastiktre), Mayon,Cottage cheese

    csv file has some million rows and path C:/myfold/goods.csv (or goods.txt)


    In code tagm can't arrange pattern, so provide it as image



    SIMILAR PATTERN
    16-09-2018 16-35-37.jpg





    Attached Files Attached Files

  2. #2
    VBAX Mentor
    Joined
    Aug 2012
    Posts
    367
    Location
    Hi Synthex,
    I'm not sure if I understand you correctly, but here goes

    You can use the Instring (instr? I forget the exact spelling) to test if one string is contained in another. for instance, "Instring("Test", "Test Cricket") would return "1", so

    If instring(.....) > 0 then
       'string exists inside second string
    end if
    I think you need to use application.worksheetfunction.instring to get this one to work

    You can use a nested loop to test for unique/duplicated items. this should get you started
    for lRow = 1 to lLastRow
      sString1 = range("A" & lrow).value
      bFlag = false
      for lRow2 = lrow+1 to lLastRow
         sString2 = range("A" & lrow2).value
         if sstring1 = sstring2 then
            bflag = true 'values match
            'do something with matching values
            exit for
         end if
       next lrow2
    next lrow
    Remember: it is the second mouse that gets the cheese.....

  3. #3
    hm. i got ther compile error
    here the screen, what i did wrong18-09-2018 12-56-13.jpg

  4. #4
    VBAX Mentor
    Joined
    Aug 2012
    Posts
    367
    Location
    Hi Synthex

    you have to put the code in a sub routine
    I would also strongly recommend that you go to the VBA editor settings and tick 'Option Explicit', this forces you to declare all your variables, and helps prevent typing mistakes.

    also, you need to define what you need to do when you find a duplicated Value - all I did was put a comment at this place.

    anyway - try this

    option explicit
    
    Public Sub FindDuplicates ()
    
    dim lRow as long
    dim lRow2 as long
    dim lLastRow as long
    dim sString as long
    'This use of capitalisation is called 'camel case' - and is good for lots of reasons
    ' using lots of comments like this helps you to understand your code later
    
    lastrow = activesheet.usedrange.rows.count
    for lRow = 1 to lLastRow
      sString1 = activesheet.range("A" & lrow).value
      if sstring1 <> "" then 'cell is not blank - proceed with test
       for lRow2 = lrow+1 to lLastRow
         sString2 = range("A" & lrow2).value
         if sstring1 = sstring2 then
            bflag = true 'values match
            'do something with matching values
            activesheet.range("A" & lrow2).interior.colorindex = 3   'mark cells with red background fill
            'you could also activesheet.range("A" & lrow2).clearcontents 'to delete values
         end if
    
        next lrow2
       end if
    next lrow
    
    
    if bflag = false then msgbox("no duplicate values found")
    
    end sub
    Remember: it is the second mouse that gets the cheese.....

  5. #5
    VBAX Mentor
    Joined
    Aug 2012
    Posts
    367
    Location
    I've not tested this code, so you probably have to debug it.
    press f8 to step through the code line by line.

    the VBE will attempt to compile the code, and highlight the line associated with any compile errors.
    once it will compile, F8 lets you see how each line works/what it does

    good luck
    Werafa
    Remember: it is the second mouse that gets the cheese.....

  6. #6
    dear, werafa
    when i run you code, i got the error, see the screen
    19-09-2018 12-11-30.jpg


    also, you need to define what you need to do when you find a duplicated Value - all I did was put a comment at this place.

    duplicate value put in column
    to be more clearly, here simple example
    makar makfa 500 makar makfa
    onion onion
    makfa makar circle makar makfa

    makar makfa it is common pattern for row 1 and 3 so opposite it, pattern placed.

    How to do it?

  7. #7
    Note, in the first post I showed on the screen the entities that I self identified. (similar patter)I want that vba independently identified these entities without me. I do not ask the initial patterns.
    i just provide example what i want vba to do without me.
    For this purpose i need vba
    let's this similar pattern it finds without me and not just comparing my patterns with the available data.
    that's the main goal)))

  8. #8
    VBAX Mentor
    Joined
    Aug 2012
    Posts
    367
    Location
    this is a 'compile error', and excel is telling you that there is a typo. (this is exactly why you use the 'option explicit' at the top)
    The typo is in the text that is highlighted. you will see that the dim statement is lLastRow - but the highlighted text is 'lastrow'. Spell it exactly as written in the dim statement near the top (type it in lowercase, and excel will convert it to the correct capitalisation when it recognises it correctly).

    to write a value in a cell, you can use:
    activesheet.range("B" & lRow2).value = "some text"
    if lRow2 = 5, then this resolves to cell B5. Change "B" to be whatever column you need, and change "Some text" to be something useful. eg.
    activesheet.range("B" & lRow2).value = activesheet.range("A" & lRow2).value
    You seem to have a couple of choices to make, so you might like to search the internet for "Excel VBA If Then"
    I think you would also do well to search for "Excel VBA Instring"

    I think these two commands will let you identify the patterns you are looking for
    Remember: it is the second mouse that gets the cheese.....

  9. #9
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    Keep it simple

    sub M_snb()
     sn=split("MAKFA Makar.,kolb,Spikachki,Bananas,Lemons,Napkins paper,Cotton sticks,HEBEKINSKIE Macaroni,CAT seed,Cheese,TEA,Biscuit,Onion,steering-wheel,Package  (Plastiktre),Mayon,Cottage cheese",",")
    
     sp=split(createobject("scripting.filesystemobject).opentextfile("G:\OF\23.csv").readall,vbcrlf)
    
     for each it in sn
       msgbox join(fiilter(sp,it),vblf)
     next
    End Sub

  10. #10
    Quote Originally Posted by snb View Post
    Keep it simple

    sub M_snb()
     sn=split("MAKFA Makar.,kolb,Spikachki,Bananas,Lemons,Napkins paper,Cotton sticks,HEBEKINSKIE Macaroni,CAT seed,Cheese,TEA,Biscuit,Onion,steering-wheel,Package  (Plastiktre),Mayon,Cottage cheese",",")
    
     sp=split(createobject("scripting.filesystemobject).opentextfile("G:\OF\23.csv").readall,vbcrlf)
    
     for each it in sn
       msgbox join(fiilter(sp,it),vblf)
     next
    End Sub

    snb, a little be wrong. Why?
    MAKFA Makar.,kolb,Spikachki,Bananas,Lemons,Napkins paper,Cotton sticks,HEBEKINSKIE Macaroni,CAT seed,Cheese,TEA,Biscuit,Onion,steering-wheel,Package (Plastiktre),Mayon,Cottage cheese"
    these classes i extract manualle and provide as example of outpur i wanted, so my question was about how to do that VBA self extracts these classe
    so as input, i just have




    I.E. only 0ne column with data on input

    1.jpg

    I.E, two column where class was defined by vba on the basis of root-key,key words



    2.jpg

    Of course, I understand that it is not worth waiting for absolute precision. any accuracy is valuable.
    Do you understand my question? If no, let me know, i 'll try to explain

  11. #11
    Knowledge Base Approver VBAX Wizard
    Joined
    Apr 2012
    Posts
    5,645
    You only need to adapt the first code line.

  12. #12
    Can't understand you. Can you show what do you mean

  13. #13
    VBAX Newbie
    Joined
    Sep 2018
    Location
    Pakistan
    Posts
    1
    Location

    Extraction of Key Entitites using VBA

    I have extracted key entities by VBA and also Tried ZetExcel SDK.

  14. #14
    DanielLee2
    just provide example how can i extract key entities by VBA in mu data.
    I have any difficulties.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •