PDA

View Full Version : Frequently bought together



efz
05-29-2012, 02:28 PM
Hello everyone,

I have a large excel file that has many receipt product lines.
This excel is formed as follows

A | B | C
Receipt Code | SKU Code| SKU Title
955151151 | 205010110 | Apple Iphone 4 Black
955151151 | 205010112 | Sony Ericsson
955151151 | 205010117 | Nokia 4SX
955151152 | 205010116 | Samsnung 4SX
955151153 | 205010118 | Casio 4BM
955151154 | 205010125 | Nokia 4SX
955151154 | 205010112 | Sony Ericsson
955151154 | 205010110 | Apple Iphone 4 Black

So what I would like to extract out of the above set would be
Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110)| 2

So it is similar logic to what e-commerce sites do online "Frequently bought together" style.

The data set is quite large (5000 lines) so memory buffer limits should be taken into account

Thanks in advance

GTO
05-29-2012, 02:41 PM
Maybe: =SUMPRODUCT(--(B2:B9=205010112),--(C2:C9="Sony Ericsson"))

efz
05-29-2012, 02:58 PM
No this doesnt work like that because this way you cant tell if they are in the same receipt and there are like 3000 unique products in the file

GTO
05-29-2012, 03:08 PM
Okay, I missed the IPhone. As to this line:
Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110)| 2
What does the 2 represent?

efz
05-29-2012, 03:16 PM
2 represents that the combination of Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110) under same receipt code occurs 2 times.

GTO
05-29-2012, 03:22 PM
Maybe?
=SUM(SUMPRODUCT(--(A2:A9=955151154),--(B2:B9=205010112),--(C2:C9="Sony Ericsson")),SUMPRODUCT(--(A2:A9=955151154),--(B2:B9=205010110),--(C2:C9="Apple Iphone 4 Black")))

efz
05-29-2012, 03:33 PM
Nope I think it cant be done with a formula needs to be coded on vba:S Imagine the fie has 5.000 lines and we need all posible combinations that occur more than 1 time and we dont have just this combination but many many combinations

mikerickson
05-29-2012, 04:01 PM
The OP has your data base.
What is the question?
Is it "what is frequently bought with an iPhone"?
or is it "what items are frequently bought together"?

In either case a logic like this could be used:

Filter database to only receipts for iPhone.
What is the second most frequently bought product from those (filtered) records (presumably "iPhone" would be most frequent)

For question 2 one could substitute "most frequently bought product" for "iPhone" in the logic.

efz
05-30-2012, 12:33 AM
The question is "what items are frequently bought together". And the OP has just a small portion of the database. Yep the logic of filtering is correct but I would like to know how I can automate this process since the database has many unique prdouct that needs to be examined so filtering one by one would take ages:)

Bob Phillips
05-30-2012, 01:33 AM
A few questions

Why isn't it ?

Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110) & Nokia 4SX (205010117) | 3

Do you just want each receipt listed, or all combinations under each and every receipt?

efz
05-30-2012, 04:42 AM
Xld you are right about this combination (there are multiple combinations not just one)

Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110) & Nokia 4SX (205010117) | 3 (the number is not 3 thought its 2) because this combination occurs twice:

Once with the receipt code 955151151
Once with the receipt code 955151154

Another combination is the one I listed
Sony Ericsoon (205010112) & Apple Iphone 4 Black (205010110)| 2 which happens twice as well

Once with the receipt code 955151151
Once with the receipt code 955151154

So in another words i need all combinations under each and every receipt

mikerickson
06-01-2012, 04:40 PM
I've been thinking about this problem.
The size of the problem isn't how many rows of data that you have, the size of the problem is the number of individuals SKU's.

In the attached, I added another receipt and created a chart. So that K4 holds the number of people who bought both a Nokia 4SX (E4) and an Orange (K1)

This approach uses names.
Name: Receipts RefersTo: =Sheet1!$A$2:$A$13
Name: Titles RefersTo: =Sheet1!$C$2:$C$13

Name: FilterOne RefersTo: =--(Titles=Sheet1!F$1)*Receipts
Name: FilterTwo RefersTo: =--(Titles=Sheet1!$E2)*Receipts

(The relative addresses in FilterOne and FilterTwo are relative to F2)

Then the CSE formula
=COUNT(IF(FilterOne<>0,(MATCH(FilterOne,FilterTwo,0))))

returns the count of common receipts.