Market Basket Analysis(MBA)-Collaborative Filtering

4 min readMay 18, 2020

To increase sales and decide combo pack offer, Stores often use Market Basket analysis.A perfect product placement is one of the key factor in retail business like Walmart, Bestbuy, Reliance and Big Bazaar.
MBA is a very nice association technique to combine related products based on their transactions.

Objective:-

✔To identify Customer mindset while buying products together like Milk+Butter,Beer+Potato chips,Balloons+Candles etc, we need some association logic to group related product

✔Finding frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases.

✔Understand customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket”.

One of the main advantages of market basket analysis is that it is perfect for undirected data mining. This technique is used when we do not know where to begin with a large dataset.

👀Association rule mining
Association rule mining is a technique to identify frequent patterns and associations among a set of items.It’s common use in shopping behaviour analysis.Can be used as the basis for decisions about marketing activities such as, e.g., promotional pricing or product placements.An association rule can be expressed in the form of X →Y, where X and Y are two disjoint itemsets (do not have any items in common).i.e., X ∩Y = ∅.The strength of an association rule can be measured in terms of its support and conﬁdence.

The main concept of association rules is to examine all possible rules between items and turn them into ‘if-then’ statements. In this case the ‘if’ part is X or the antecedent, while the ‘then’ part is Y or the consequent.X is an antecedent and Y is a consequent, in other words, X implies Y.

Antecedent → consequent [support, confidence]

Support: denotes the frequency of the rule within transactions. A high value means that the rule involves a great part of database. support(A →B [ s, c ]) = p(A /B) i.e Support(X) = frequency of transactions where X present/Total number of transactions

Confidence: denotes the percentage of transactions containing A which also contain B. It is an estimation of conditioned probability. confidence(A→B[ s, c ]) = p(B|A)= sup(A,B)/sup(A).

Lift (also called improvement or impact) is a measure to overcome the
problems with support and confidence. Lift is said to measure the difference — measured in ratio — between the confidence of a rule and the expected confidence. Consider an association rule “if A then B.” The lift for the rule is defined as
P(B|A)/P(B) or P(AB)/[P(A)P(B)].

Apriori Algorithm

The classic approach for generating frequent itemsets is using the Apriori algorithm. (Rakesh Agrawal, Srikant Ramakrishnan, 1994). According to the Apriori property: ‘All subsets of a frequent itemset must also be frequent’. If it has been verified that an itemset X is infrequent, there is no need for further investigating its subsets as they must be infrequent too. For example, in the given dataset, if a transaction that contains of { Hair conditioner, shampoo, hair dye} is frequent, a transaction containing {hair conditioner, shampoo} is also frequent.

a)Apriori algorithm assumes that any subset of a frequent itemset must be frequent.

b)use prior knowledge of frequent itemset properties

c)If an itemset is infrequent, all its supersets will be infrequent

So, with all these above knowledge,we will see Python code example