Market Basket Analysis By Apriori -Yogeshwar

Keywords : Apriori Association rules Holoviews Support

Table of Contents

  1. Overview
  2. Import libraries
  3. Getting the data
  4. Pre-processing
  5. EDA
  6. Associate Rule Mining with Apriori Algorithm
  7. Visualizing the results
  8. Conclusion

1. Overview

Introduction

The era has come where the computer knows better about us than we do. Our device is so powerful that it knows what we are doing right now and what are we going to do in the future. The following application of AI is calleda as Market Basket Analysis which is widely used in the Retail stores where the application predicts the closely associated items we are likely to buy along with the product we purhcased.

Project Detail

In this project, I use Groceries_dataset , which has the dataset with 38765 rows of the purchase orders of people from the grocery stores. The dataset has only one csv

CSV name: Groceries_dataset.csv

In these dataset above, I have analysed the dataset with visualizations and perform A rule mining with the help of Apriori algorithm. I have never realized or questioned myself why these items are kept closely in the supermarket, thought that it was for customer's convenience but little did I know that it had a business impact.

Goal of this notebook

  • Getting and cleaning the data
  • Understanding the data using EDA techniques
  • Perform A-rule mining
    • using Apriori algorithm
  • Visualizing the results of association between items

Table of Contents

2. Import libraries

Table of Contents

3. Getting the data

According to dataset information, it has the following features :

  • Member_number: This is like a customer id given to the customer post purchase transaction
  • Date: This is the date at which purchase/ transaction was made
  • itemDescription: Name of the item which was purchased
📌 It is important to look at the number of non-null records and their data types.Often the date doesn't have the datetime format.

From the information we can identify that

  • We don't have any null records in the dataset. BAM !
  • Date column is an object data type. small bam!

Table of Contents

4. Pre-processing

Renaming column

Date information

Table of Contents

5. EDA

No. of items sold in 2014 and 2015

Cummulative day transactions in 2014 & 2015

Monthly quantity purchased from grocery store

Number of quantity purchased across weekdays

Top and bottom 10 Fast moving products

Top Customers in 2014 and 2015

Table of Contents

6. Associate Rule Mining with Apriori Algorithm

What is market basket analysis?

1.PNG

Have you ever wandered around super market and wondered all the sections and racks are designed in a way that the products are related ? like you can get bread and butter in nearby racks; brush and toothpaste in same racks. These products are associated. If you buy a brush the likelihood of you buying the paste is high. These are marketing tactics to make you fill up the basket with products with their associated items thereby increasing sales revenue. Few business introduce discount in the associated item or combine both the products and sell at a lower rate inorder to make you buy the item+item associated to it

What is association rule mining?

Association rule mining is the technique used to unveil the association between items, where the items we pruchase are denoted as X->Y

Here X is the item we buy and Y is the item we most likely to buy (More like if->then) it is also called as

  • X- Anticedent
  • Y- Consequent

Asociation rule mining helps in designing the rules for the assocation of items. These rules are formed with the help of three terminologies

1.Support: It signifies the popularity of the item, if an item is less frequently bought then it will be ignored in the association.

2.Confidence: It tells the likelihood of purchasing Y when X is bought.Sounds more like a conditional probability.Infact it is ! But it fails to check the popularity(frequency) of Y to overcome that we got lift.

3.Lift: It combines both confidence and support.A lift greater than 1 suggests that the presence of the antecedent increases the chances that the consequent will occur in a given transaction. Lift below 1 indicates that purchasing the antecedent reduces the chances of purchasing the consequent in the same transaction.

For example Assume there are 100 customers where 10 of them bought milk, 8 bought butter and 6 bought both of them. We need to check the association of bought milk => bought butter

  • support = P(Milk & Butter) = 6/100 = 0.06
  • confidence = support/P(Butter) = 0.06/0.08 = 0.75
  • lift = confidence/P(Milk) = 0.75/0.10 = 7.5

What is Apriori?

Apriori algorithm uses frequent itemsets to get association rules,but on the assumptions that

  • All subsets of frequent itemsets must be frequent
  • Similarly incase of infrequent subset their parent set is infrequent too The algorithm works in such a way that a minimum support value is set and iterations happen with frequent itemsets. Itemsets and subsets are ignored if their support is below the threshold till there can't be any removal.

Later lift of these selected itemsets(rules) are calculated and if the value is below the threshold the rules are eliminated since algorithm may take time to compile if we take all rules

Preparing the data

Before proceeding with apriori we have to prepare the data in a sparse matrix format where products are in column and id as index . Initially we group by based on the quantity purchased and later we encode it with 0s and 1s

Applying Apriori

Here we apply apriori algorithm and get all the frequent itemsets(with 70% as support threshold) and apply association rules function to derive rules where we use lift metric

Building dynamic function to customize rules

Table of Contents

7. Visualizing the results

The results in tabular form will not convey much insights into our algorithm so let's visualize the rules

Relationship between the metrics

Network diagram of rules

Here we make network diagram of specified number of rules where we can see the antecedents and consequents connected to the rules

Strength of association using heatmap

I have discovered the nassociation of items, what's good if we don't know the strength of their relationship

Table of Contents

8. Conclusion