Wednesday, 2 November 2016

Recommendation Engine - Market Basket Analysis

Market basket analysts examine the buying habits of customers based on the types of products they are most likely to purchase in conjunction with other products; i.e. a customer who buys a particular brand of shampoo may be more likely to buy the same brand of hair conditioner at the same time. This information is useful to companies who wish to target their marketing and advertising dollars to specific customers, or who wish to pursue cross-promotional opportunities between two or more products.
 Products are nothing but the items in basket/transactions. Here are some terms related to market basket analysis.

We can represent our items as an item set as follows:
I = { i1,i2,…,in }
Therefore a transaction is represented as follows:
tn = { ij,ik,…,in }
This gives us our rules which are represented as follows:
{ i1,i2} => { ik}
Which can be read as “if a user buys an item in the item set on the left hand side, then the user will likely buy the item on the right hand side too”. A more human readable example is:
{coffee,sugar} => {milk}
If a customer buys coffee and sugar, then they are also likely to buy milk.
With this we can understand three important ratios; the support, confidence and lift. We describe the significance of these in the following bullet points, but if you are interested in a formal mathematical definition you can find it on wikipedia.
  • Support: The fraction of which our item set occurs in our dataset.
  • Confidence: probability that a rule is correct for a new transaction with items on the left.
  • Lift: The ratio by which by the confidence of a rule exceeds the expected confidence. 
  • Note: if the lift is 1 it indicates that the items on the left and right are independent.
1)      Reading data in transaction data form-
trans_basket_data<-read.transactions("data.csv", format = "single", sep = ",", cols = c("order_number", "sku"))

2)      Aprioro Algo in R
apriori(data, parameter = NULL, appearance = NULL, control = NULL)
object of class transactions or any data structure which can be coerced intotransactions (e.g., a binary matrix or data.frame).
object of class APparameter or named list. The default behavior is to mine rules with minimum support of 0.1, minimum confidence of 0.8, maximum of 10 items (maxlen), and a maximal time for subset checking of 5 seconds (maxtime)

3)     Creating function market_basket_analysis to run and visualize rules for any transaction type data

itemFrequencyPlot(trans_data, topN=20, type =c("absolute"), support = 100, xlab = "SKU IDs if items")

transactionLevel_rules <- apriori(trans_data, parameter = list(supp = 0.0001, conf = 0.8, maxlen = 10) , control = list(verbose=TRUE))
#pattern upto 7 is showed even after changing maxlen

transactionLevel_subrules<- head(sort(transactionLevel_rules, by= "lift"), 100) # taking only high lift rules
plot(transactionLevel_rules) # visulizing  rules

plot(transactionLevel_subrules, method="graph", control = list(type="items", main="")) # affinity of rules
plot(transactionLevel_subrules, method="matrix", measure="lift") # heat map


Visualization of all rules satisfying the criteria in apriory alogorithem-
Heat map (correlation between rules based on lift)
Heat map is related to rule numbers like this (below image is for multiple rule items.)

read a blog to know about all the text classification algorithms-

read another blog to know relation between time series and simple regression analysis-