My csv file has next structure
16-09-2018 15-52-11.jpgMy csv file has next structure
16-09-2018 15-52-11.jpg
The body of text is names of products from check. (GOODS_NAME)For example
I want to group any similar names.
some phrases have the same and either root-key, key patterns
word MAKFA Makar (it is Ukraine words)
I found 7 rows with
"root or key word MAKFA Makar"Pasta Makfa snail flow-pack 450 g. MAKFA Macaroni feathers like. in/ with 2013077 MAKFA Makar.RAKERS 450g 2013077 MAKFA Makar.RAKERS 450g 6788 MAKFA Makar.perya 450g 2049750 MAKFA Makar.SHIGHTS 450g 2049750 MAKFA Makar.SHIGHTS 450g and so on. There are many phares. So we extract similar root-key pattern Initially similar pattern 1 Pasta Makfa snail flow-pack 450 g. MAKFA Makar. 2 MAKFA Macaroni feathers like. in/ with MAKFA Makar. 32013077 MAKFA Makar.RAKERS 450g MAKFA Makar. 42013077 MAKFA Makar.RAKERS 450g MAKFA Makar. 56788 MAKFA Makar.perya 450g MAKFA Makar. 62049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar. 72049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar. 8*3398012 DD Kolb.SERV.OKHOTN in/ to v / y0.35 kolb 9*3014084 D.Dym.Spikachki DEREVEN.MINI 1kg Spikachki 10809 Bananas 1kg Bananas 11 Lemons 55+ Lemons 12 Napkins paper color 100pcs PL Napkins paper 13 SOFT Cotton sticks 100 PE (BELL Cotton sticks 14 SHEBEKINSKIE Macaroni Butterfly №40 SHEBEKINSKIE Macaroni 15*3426789 WH.The corn rav guava / yagn.d / Cat SEED 85g CAT seed 16 FetaXa Cheese product 60% 400g ( Cheese 173491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA 182030918 MARIA TRADITIONAL Biscuit 180g Biscuit 19197 Onion 1 kg Onion 20 TOBUSsteering-wheel 0.5kg flow steering-wheel 21 Package "Magnet" white (Plastiktre) Package (Plastiktre) 22*2108609 SLOB.Mayon.OLIVK.67% 400ml Mayon 23 TENDER AGE Cottage cheese 10 Cottage cheesehere 7 patterns is root-key( MAKFA Makar.) another products in this list are unique, but it can be that in whole list
3491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
3491144 LIP.NAP.ICE TEA BLACK yellow 0.5 liter TEA
same root-key pattern is TEA.
How to get all unique root-key patterns. Here 17 uninque patterns.
MAKFA Makar., kolb , Spikachki, Bananas, Lemons, Napkins paper, Cotton sticks, HEBEKINSKIE Macaroni, CAT seed, Cheese ,TEA, Biscuit, Onion, steering-wheel, Package (Plastiktre), Mayon,Cottage cheese
csv file has some million rows and path C:/myfold/goods.csv (or goods.txt)