Market basket analysts examine the buying habits of customers based on the types of products they are most likely to purchase in conjunction with other products; i.e. a customer who buys a particular brand of shampoo may be more likely to buy the same brand of hair conditioner at the same time. This information is useful to companies who wish to target their marketing and advertising dollars to specific customers, or who wish to pursue crosspromotional opportunities between two or more products.
Products are nothing but the items in basket/transactions. Here are some terms related to market basket analysis.
read another blog to know relation between time series and simple regression analysis
http://machinelearningstories.blogspot.in/2016/08/timeseriesandfittingregressionon.html
Products are nothing but the items in basket/transactions. Here are some terms related to market basket analysis.
We can represent our items as an item set as follows:
I = { i_{1},i_{2},…,i_{n} }
Therefore a transaction is represented as follows:
t_{n} = { i_{j},i_{k},…,i_{n} }
This gives us our rules which are represented as follows:
{ i_{1},i_{2}} => { i_{k}}
Which can be read as “if a user buys an item in the item set on the left hand side, then the user will likely buy the item on the right hand side too”. A more human readable example is:
{coffee,sugar} => {milk}
If a customer buys coffee and sugar, then they are also likely to buy milk.
With this we can understand three important ratios; the support, confidence and lift. We describe the significance of these in the following bullet points, but if you are interested in a formal mathematical definition you can find it on wikipedia.
 Support: The fraction of which our item set occurs in our dataset.
 Confidence: probability that a rule is correct for a new transaction with items on the left.
 Lift: The ratio by which by the confidence of a rule exceeds the expected confidence.
 Note: if the lift is 1 it indicates that the items on the left and right are independent.
1) Reading data in transaction data form
trans_basket_data<read.transactions("data.csv",
format = "single", sep = ",", cols =
c("order_number", "sku"))
2) Aprioro Algo in R
apriori(data, parameter = NULL,
appearance = NULL, control = NULL)
data

object of
class transactions or any data
structure which can be coerced intotransactions (e.g., a
binary matrix or data.frame).

parameter

object of
class APparameter or named
list. The default behavior is to mine rules with minimum support of 0.1,
minimum confidence of 0.8, maximum of 10 items (maxlen), and a maximal time
for subset checking of 5 seconds (maxtime)

3)
Creating function
market_basket_analysis to run and visualize rules for any transaction type data
market_basket_analysis<function(trans_data)
{
#dev.off()
itemFrequencyPlot(trans_data,
topN=20, type =c("absolute"), support = 100, xlab = "SKU IDs if
items")
transactionLevel_rules
< apriori(trans_data, parameter = list(supp = 0.0001, conf = 0.8, maxlen =
10) , control = list(verbose=TRUE))
#pattern upto 7 is
showed even after changing maxlen
transactionLevel_subrules<
head(sort(transactionLevel_rules, by= "lift"), 100) # taking only
high lift rules
plot(transactionLevel_rules)
# visulizing rules
plot(transactionLevel_subrules,
method="graph", control = list(type="items",
main="")) # affinity of rules
plot(transactionLevel_subrules,
method="matrix", measure="lift") # heat map
}
Visualization
of all rules satisfying the criteria in apriory alogorithem
read a blog to know about all the text classification algorithms
http://machinelearningstories.blogspot.in/2016/08/documentclassificationortext.html
http://machinelearningstories.blogspot.in/2016/08/documentclassificationortext.html
read another blog to know relation between time series and simple regression analysis
http://machinelearningstories.blogspot.in/2016/08/timeseriesandfittingregressionon.html