synthex
09-16-2018, 06:36 AM
My csv file has next structure
22881My csv file has next structure
22881
The body of text is names of products from check. (GOODS_NAME)
I want to group any similar names.
some phrases have the same and either root-key, key patterns
For example
word MAKFA Makar (it is Ukraine words)
I found 7 rows with
"root or key word MAKFA Makar"
Pasta Makfa snail flow-pack 450 g.
MAKFA Macaroni feathers like. in/ with
2013077 MAKFA Makar.RAKERS 450g
2013077 MAKFA Makar.RAKERS 450g
6788 MAKFA Makar.perya 450g
2049750 MAKFA Makar.SHIGHTS 450g
2049750 MAKFA Makar.SHIGHTS 450g
and so on. There are many phares. So we extract similar root-key pattern
Initially similar pattern
1 Pasta Makfa snail flow-pack 450 g. MAKFA Makar.
2 MAKFA Macaroni feathers like. in/ with MAKFA Makar.
32013077 MAKFA Makar.RAKERS 450g MAKFA Makar.
42013077 MAKFA Makar.RAKERS 450g MAKFA Makar.
56788 MAKFA Makar.perya 450g MAKFA Makar.
62049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar.
72049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar.
8*3398012 DD Kolb.SERV.OKHOTN in/ to v / y0.35 kolb
9*3014084 D.Dym.Spikachki DEREVEN.MINI 1kg Spikachki
10809 Bananas 1kg Bananas
11 Lemons 55+ Lemons
12 Napkins paper color 100pcs PL Napkins paper
13 SOFT Cotton sticks 100 PE (BELL Cotton sticks
14 SHEBEKINSKIE Macaroni Butterfly №40 SHEBEKINSKIE Macaroni
15*3426789 WH.The corn rav guava / yagn.d / Cat SEED 85g CAT seed
16 FetaXa Cheese product 60% 400g ( Cheese
173491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
182030918 MARIA TRADITIONAL Biscuit 180g Biscuit
19197 Onion 1 kg Onion
20 TOBUSsteering-wheel 0.5kg flow steering-wheel
21 Package "Magnet" white (Plastiktre) Package (Plastiktre)
22*2108609 SLOB.Mayon.OLIVK.67% 400ml Mayon
23 TENDER AGE Cottage cheese 10 Cottage cheese
here 7 patterns is root-key( MAKFA Makar.) another products in this list are unique, but it can be that in whole list
3491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
3491144 LIP.NAP.ICE TEA BLACK yellow 0.5 liter TEA
same root-key pattern is TEA.
How to get all unique root-key patterns. Here 17 uninque patterns.
MAKFA Makar., kolb , Spikachki, Bananas, Lemons, Napkins paper, Cotton sticks, HEBEKINSKIE Macaroni, CAT seed, Cheese ,TEA, Biscuit, Onion, steering-wheel, Package (Plastiktre), Mayon,Cottage cheese
csv file has some million rows and path C:/myfold/goods.csv (or goods.txt)
In code tagm can't arrange pattern, so provide it as image
SIMILAR PATTERN
22882
22881My csv file has next structure
22881
The body of text is names of products from check. (GOODS_NAME)
I want to group any similar names.
some phrases have the same and either root-key, key patterns
For example
word MAKFA Makar (it is Ukraine words)
I found 7 rows with
"root or key word MAKFA Makar"
Pasta Makfa snail flow-pack 450 g.
MAKFA Macaroni feathers like. in/ with
2013077 MAKFA Makar.RAKERS 450g
2013077 MAKFA Makar.RAKERS 450g
6788 MAKFA Makar.perya 450g
2049750 MAKFA Makar.SHIGHTS 450g
2049750 MAKFA Makar.SHIGHTS 450g
and so on. There are many phares. So we extract similar root-key pattern
Initially similar pattern
1 Pasta Makfa snail flow-pack 450 g. MAKFA Makar.
2 MAKFA Macaroni feathers like. in/ with MAKFA Makar.
32013077 MAKFA Makar.RAKERS 450g MAKFA Makar.
42013077 MAKFA Makar.RAKERS 450g MAKFA Makar.
56788 MAKFA Makar.perya 450g MAKFA Makar.
62049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar.
72049750 MAKFA Makar.SHIGHTS 450g MAKFA Makar.
8*3398012 DD Kolb.SERV.OKHOTN in/ to v / y0.35 kolb
9*3014084 D.Dym.Spikachki DEREVEN.MINI 1kg Spikachki
10809 Bananas 1kg Bananas
11 Lemons 55+ Lemons
12 Napkins paper color 100pcs PL Napkins paper
13 SOFT Cotton sticks 100 PE (BELL Cotton sticks
14 SHEBEKINSKIE Macaroni Butterfly №40 SHEBEKINSKIE Macaroni
15*3426789 WH.The corn rav guava / yagn.d / Cat SEED 85g CAT seed
16 FetaXa Cheese product 60% 400g ( Cheese
173491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
182030918 MARIA TRADITIONAL Biscuit 180g Biscuit
19197 Onion 1 kg Onion
20 TOBUSsteering-wheel 0.5kg flow steering-wheel
21 Package "Magnet" white (Plastiktre) Package (Plastiktre)
22*2108609 SLOB.Mayon.OLIVK.67% 400ml Mayon
23 TENDER AGE Cottage cheese 10 Cottage cheese
here 7 patterns is root-key( MAKFA Makar.) another products in this list are unique, but it can be that in whole list
3491144 LIP.NAP.ICE TEA green yellow 0.5 liter TEA
3491144 LIP.NAP.ICE TEA BLACK yellow 0.5 liter TEA
same root-key pattern is TEA.
How to get all unique root-key patterns. Here 17 uninque patterns.
MAKFA Makar., kolb , Spikachki, Bananas, Lemons, Napkins paper, Cotton sticks, HEBEKINSKIE Macaroni, CAT seed, Cheese ,TEA, Biscuit, Onion, steering-wheel, Package (Plastiktre), Mayon,Cottage cheese
csv file has some million rows and path C:/myfold/goods.csv (or goods.txt)
In code tagm can't arrange pattern, so provide it as image
SIMILAR PATTERN
22882