Log in

View Full Version : [SOLVED:] Finding Top Unique Distinct Pairs of Items per Guest Check



JackChang
11-04-2024, 02:59 PM
All-

I work at a restaurant and we're trying to find the top combination menu items for suggestive selling. Attached is a sample dataset with 2 columns, check ID (items bought by the customer), and the menu item name.

I'm trying to get the top pairs of items most purchased by customers, this would be 2 unique items per check. For example, guest checks 2118880065, 2118885146, 2118876523, 2118882225 all purchased "FF Chili Cheese Fries" and "FF Korean Chicken".

The first guest check 2118880065, customer purchased exactly these two items, but check 2118885146, customer purchased 10 items total, but within those, they purchased "FF Chili Cheese Fries" and "FF Korean Chicken".

Basically, each item can be paired with any other item, creating an endless combination of pairs, but we're trying to find the top selling pairs. I know Plyon can do it, and i've been told a data science methodology is the best way, but I don't have those skills, so I'm wondering how I could do it in Excel with formulas.

Thank you.

jdelano
11-05-2024, 11:26 AM
I have a spreadsheet where I create a matrix of each recipe as a row and add a column. Then for each ticket it loops each recipe and increments the total each recipe is on the ticket.
Then I'll create a list of the top 10 pairs.

I'll finish in the morning

edit: It'll look something like this

JackChang
11-05-2024, 03:40 PM
Hi jdelano, okay, thank you for your help! I really appreciate it.

arnelgp
11-05-2024, 08:31 PM
here is a demo, on result sheet, click "Show Combi" button.
then you need to Sort "Combi Count" column "Largest to smallest"

//Edit: i change the code on the xlsm file.

jdelano
11-06-2024, 02:59 AM
Hi jdelano, okay, thank you for your help! I really appreciate it.

Looks like arnelgp has got you covered; my only question is how to handle when a ticket has a duplicate recipe (2 guests 1 ticket). Like Korean Chicken & Chil here? I assume you'd want it counted just once.



2118885146,FF Korean Chicken
2118885146,FF Fruit Punch large
2118885146,FF Orange Bang large
2118885146,FF Diet Pepsi large
2118885146,FF Chicken Katsu Mini
2118885146,FF Chili Cheese Fries
2118885146,FF Korean Chicken & Chil
2118885146,FF Saimin
2118885146,FF Korean Chicken & Chil
2118885146,FF Fruit Punch reg

Dave
11-06-2024, 10:31 AM
Hi all. I messed around with this and am interested in your outcomes. With the Sample data, I found 216 menu items with 46411 possible duplicate menu combinations. The top combinations: Bacon and Egg with Coffee (5); Clubhouse Sandwich Com with Coffee(4); Chili Cheese Fries with Korean Chicken(4); and Glaze Yeast Donut with Choc Center Yeast Donu(4). I'm not going to post the code/wb as it's fairly inefficient and takes some time to run with XL becoming non-responsive until code completion. Must be lunch time as I seem kind of hungry for some reason. Dave
ps. I excluded counting duplicates on the same order

CharlotteMu
11-06-2024, 08:24 PM
Hi all. I messed around with this and am interested in your outcomes. With the Sample data, I found 216 menu items with 46411 possible duplicate menu combinations. The top combinations: Bacon and Egg with Coffee (5); Clubhouse Sandwich Com with Coffee(4); Chili Cheese Fries with Korean Chicken(4); and Glaze Yeast Donut with Choc Center Yeast Donu(4). I'm not going to post the code/wb as it's fairly inefficient and takes some time to run with XL becoming non-responsive until code completion. Must be lunch time as I seem kind of hungry for some reason. Dave
ps. I excluded counting duplicates on the same order
I have similar results to you.

arnelgp
11-06-2024, 11:59 PM
i changed the code and now i have this as top 5"



Order1
Order2
Combi Count


B Choc Center Yeast Donu
B Glaze Yeast Donut
5


R Bacon and Egg
R Coffee
5


FF Chili Cheese Fries
FF Korean Chicken
4


$ deluxe - FF
FF Hamburger
4


R Clubhouse Sandwich Com
R Coffee
4



the code took 2.65 seconds to complete on I5 11gen cpu.

jdelano
11-07-2024, 07:26 AM
This is what I went and did. It is overkill and can be optimized a bunch.
It allows for selecting a CSV file (as long as it matches the example), builds the matrix of combos, then displays the top 10 pairings.

edit: reattach excel file

Dave
11-07-2024, 09:10 AM
@arnelgp the $$ deluxe - FF with FF Hamburger were ordered together on the same receipt twice so your additional results do make sense. I had excluded duplicate results as ordering 1 item twice on the same menu artificially doubles the number of times the remaining items would be paired. I misstated the number of possible duplicate menu items as my output displayed every item and possible pairing... in effect doubling the number of possible pairings. There are actually only 23220 possible outcomes ( I excluded Promo $$ off and Delivery Markup as menu items). I'm guessing there's some math that could calculate that number (216 menu items and all possible pairings). 2.65 seconds! That's impressive compared to my approx. 17mins. I'm going to trial your wizardry later.
@ jdalano Wow! That looks impressive. I think I'll also give it a whirl later. Interesting challenge. Hope JackChang receives some benefit. Dave

Dave
11-07-2024, 11:15 AM
Those are both some impressive pieces of coding utilizing different approaches. I'm not familiar with restaurant speak, so I have no idea what the order prefixes are about (ie. R, FF, B etc.) but I did notice that some menu items are the same only the prefix is changed (ie. R Pepsi Large and FF Pepsi Large). I removed the prefixes and bubble sorted the orders to find unique menu items to start. It seems erroneous results will be produced without determining which items are similar to begin with. However, everyone seemed to have arrived at similar results (other than the duplicate order thing). Both @arnelgp and @ jdelano versions are amazingly fast... I've still got quite a bit of learning to do. Dave

JackChang
11-07-2024, 05:38 PM
Wow, thank you so much everyone! This really helps! I did try all of the files and it works to some extent, at least we can get an idea of our top item combinations purchased by our existing members which we wanted to use for suggestive selling. The macros work good on smaller datasets. When I loaded in 1 million rows I got an overflow message and the other one, it returned the first 2000 rows I believe. Still way better than what I was doing, which was manually selecting pairs, putting them in a pivot table, using a distinct count and then counting any rows that had 2 or more. This has all been a learning experience and I really appreciate the help arnelgp, jdelano, and dave!

I just finished closing the books for the month, so now I have more time to look at the results and try to make sense of all of it. Thank you again everyone, will keep you posted.

arnelgp
11-07-2024, 08:08 PM
it might help to replace Integer with Long variable declaration on the code to avoid overflow.

jdelano
11-08-2024, 05:45 AM
You're welcome, happy to lend a hand. If you want to have something parse through that much data, I'd suggest a desktop app. Using Excel here is basically adding a huge overhead to something that can be produced after the fact, if you need this information in a workbook.

If you're interested, I'd be happy to create a simple app.

georgiboy
11-08-2024, 06:31 AM
Thought I would have a little play with this today, I have not used VBA nor formula, but instead 'Power Query' which is built into most of the used versions of Excel nowadays.

To run it, you would:
Paste your data into the table that resides on the tab named: Data
Go to 'Data' tab in the ribbon of Excel
Press the 'Refresh all' button

It will spit the results out on the tab named: Query

It will display any pairings that appear more than once

File attached.

For reference, the query code is below:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
GroupedData = Table.Group(Source, {"guestCheckId"}, {{"AllRows", each _}}),
RecipePairs = Table.AddColumn(GroupedData, "RecipePairs", each
let
AllRows = [AllRows],
RecipeNames = Table.Column(AllRows, "RecipeName"),
Pairs = List.Transform(List.Positions(RecipeNames), (i) =>
let
RemainingItems = List.RemoveFirstN(RecipeNames, i + 1)
in
List.Transform(RemainingItems, (j) =>
let
SortedPair = Text.Combine(List.Sort({RecipeNames{i}, j}), ",")
in
[SortedPair = SortedPair]
)
),
UniquePairs = List.Distinct(List.Combine(Pairs))
in
UniquePairs
),
ExpandedPairs = Table.ExpandListColumn(RecipePairs, "RecipePairs"),
ExpandedPairsTable = Table.ExpandRecordColumn(ExpandedPairs, "RecipePairs", {"SortedPair"}),
FilteredRows = Table.SelectRows(ExpandedPairsTable, each ([SortedPair] <> null)),
DuplicatedColumn = Table.DuplicateColumn(FilteredRows, "SortedPair", "SortedPair - Copy"),
SplitColumnbyDelimiter = Table.SplitColumn(DuplicatedColumn, "SortedPair - Copy", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Recipe1", "Recipe2"}),
ConColumn = Table.AddColumn(SplitColumnbyDelimiter, "Custom", each if [Recipe1] = [Recipe2] then 1 else null),
FilteredRows1 = Table.SelectRows(ConColumn, each ([Custom] = null)),
GroupedPairs = Table.Group(FilteredRows1, {"Recipe1", "Recipe2"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
SortedPairs = Table.Sort(GroupedPairs, {{"Count", Order.Descending}}),
FilteredRows2 = Table.SelectRows(SortedPairs, each ([Count] <> 1))
in
FilteredRows2

arnelgp
11-09-2024, 09:10 AM
just in case your interested, i made an ms access database.
it will calculate the usual top combinations, plus the best sellers
from the list. it is very fast on the sample csv you posted.