Building a Product Recommendation System with Collaborative Filtering

We will go deeper into building a product recommendation system that we can better target customers with, using product recommendations that are tailored to individual customers. Studies have shown that customized product recommendations improve conversion rates and customer retention rates.

A product recommendation system is a system whose objective is to predict and compile a list of items that a customer is likely to buy. Referral systems have gained much popularity in recent years and have been developed and implemented for various commercial use cases

For example,

With the potential to be used in a variety of areas, referral systems play a critical role in many businesses, especially e-commerce and media businesses, as they directly impact sales revenue and user engagement

Generally, there are two ways to develop a list of recommendations:

1. Collaborative filtering

The collaborative filtering method is based on users' previous behaviors, such as the pages they saw, the products they bought, or the ratings they gave to different items. The collaborative filtering method uses this data to find similarities between users or articles, and recommends the most similar articles or content to users.

The basic assumption behind the collaborative filtering method is that those who have seen or bought similar content or products in the past are likely to see or buy similar types of content or products in the future.

Thus, based on this assumption, if one person bought items A, B, and C and another person bought items A, B, and D in the past, it is likely that the first person will buy item D and the other person will buy item C, since they share many similarities with each other

2. Content-based filtering

On the other hand, content-based filtering produces a list of recommendations based on the characteristics of an article or user. It usually examines the keywords that describe the characteristics of an article. The basic assumption of the content-based filtering method is that users are likely to see or buy items similar to those they have bought or seen in the past

For example, if a user has listened to some songs in the past, the content-based filtering method will recommend similar song types that share similar characteristics to those the user has already heard

Building a Product Recommendation System with Collaborative Filtering

As mentioned, a collaborative filtering algorithm is used to recommend products based on user behavior history and similarities between them. The first step in implementing a collaborative filtering algorithm for a product recommendation system is to build a user-to-item matrix

A user-to-item matrix comprises individual users in the rows and individual elements in the columns. It will be easier to explain with an example. Take a look at the following matrix

Screenshot%20from%202020-11-28%2009-10-27.png

The rows of this matrix represent each user and the columns represent each element. The values in each cell represent whether the given user bought the given item or not. For example, user 1 has purchased items B and D and user 2 has purchased items A, B, C and E

To build a product recommendation system based on collaborative filtering, we need to first build this type of user-to-item matrix. With this user-to-item matrix, the next step in building a product recommendation system based on collaborative filtering is to calculate the similarities between users

To measure similarities, the similarity of cosines is often used. The equation for computing cosine similarity between two users looks like this

Screenshot%20from%202020-11-28%2006-45-40.png

In this equation, U1 and U2 represent user 1 and user 2. P1i and P2i represent each product, i, that user 1 and user 2 have purchased. If you use this equation, you will get 0.353553 as the cosine similarity between users 1 and 2 in the example above and 0.866025 as the cosine similarity between users 2 and 4

As you can imagine, the greater the similarity of the cosine, the more similar the two users. Thus, in our example, users 2 and 4 are more similar to each other than users 1 and 2. Finally, when using a collaborative filtering algorithm for product recommendations, there are two approaches that can be taken: a user-based approach and an item-based approach

As the names suggest, the user-based approach to collaborative filtering uses the similarities between users. On the other hand, the article-based collaborative filtering approach uses the similarities between the items. This means that when we calculate the similarities between the two users in the collaborative filtering of the user-based approach, we need to build and use a user-to-article matrix, as we have discussed above

However, for the item-based approach, we need to calculate the similarities between the two elements, and this means that we need to build and use an item-to-user matrix, which we can obtain by simply transposing the user-to-item matrix.

Let's discuss how to build a product recommendation system using Python. We will begin this section by analyzing some e-commerce business data and then discuss the two approaches to building a product recommendation system with collaborative filtering

For this Notebook you'll find the dataset once you subscribe to our service here

There are records with negative values in the Quantity column, representing cancelled orders. Let's ignore and delete these records. We can filter all these records in our DataFrame with the following code:

Data Preparation

Before we dive into building a product recommendation engine using a collaborative filtering algorithm, we need to do the following couple of things:

First, we need to manage the NaN values in our dataset, especially those NaN in the CustomerID field. Without correct values in the CustomerID field, we cannot build a proper recommendation system, since the collaborative filtering algorithm depends on historical item purchase data for individual customers.

Secondly, we need to build a user-to-item matrix before we can implement the collaborative filtering algorithm for product recommendation. The user-to-item matrix is simply tabular data, where each column represents each product or item, each row represents a customer, and the value in each cell represents whether or not the given customer bought the given product.

Managing NaNs in the CustomerID field

If you look closely at the data, you will notice that there are some records without customer identification. Since we have to build a customer item matrix in which each row is specific to each customer, we cannot include those records without CustomerID in our data. Let's first see how many records do not have a customer ID.

Let's take a look at the following code:

As you can see in this output, there are 133,361 records without customer identification. And some of the data that is missing customer identification looks like this:

Now that we know that there are records with missing customer identification entries, we have to exclude them from further analysis. One way to remove them from our DataFrame is to use the dropna function, as in the following:

Building a matrix of items for the client

The data we now have represents individual items purchased by customers. However, to build a product recommendation system with a collaborative filtering algorithm, we need to have data where each record contains information about which item each customer has purchased.

We are going to transform the data into a user-to-item matrix, where each row represents one customer and the columns correspond to different products.

Let's take a look at the following code:

As you can see from this, we now have a matrix where each row represents the total quantities purchased for each product for each customer.

Now, let's code 0-1 this data, so that a value of 1 means that the given product was bought by the given customer, and a value of 0 means that the given product was never bought by the given customer. Take a look at the following code:

As you can see in this code, we are using the "applymap" function, which applies a given function to each element of a DataFrame.

The Lambda function we are using in this code simply codes all the elements whose values are greater than 0 with 1, and the rest with 0.

Now we have an array of client elements that we can use for the collaborative filtering algorithm. Let's now move on to the construction of product recommendation engines.

Collaborative filtering

We will explore two approaches to building our user-based and article-based recommender. In the user-based approach, we calculate the similarities between users based on their item purchase history. In the item-based approach, on the other hand, we calculate the similarities between items based on which items are often purchased along with which other items.

To measure the similarity between users or between articles, we will use the cosine_similarity method in the scikit-learn package. You can import this function using the following code:

This cosine_similarity function in the sklearn package calculates the cosine similarities in pairs in the given data.

User-based collaboration filters and recommendations

To build a user-based collaborative filtering algorithm, we need to calculate the cosine similarities between users. Let's take a look at the following code:

As you can see in this code, we are using the cosine_similarity function of the metrics.pairwise module from the sklearn package. This function calculates the cosine similarities in pairs between the samples and produces the results as an array type.

Then, we create a pandasDataFrame with this output array and store it in a variable called user_user_sim_matrix, which means user-user similarity array.

As you can see, the index and the names of the columns are not easy to understand. Since each column and row in the index represents individual clients, we're going to rename the index and columns using the following code:

Let's take a closer look at this matrix of similarities between users. As you can imagine, the cosine similarity between a client and itself is 1, and this is what we can observe from this similarity matrix. The diagonal elements of this user similarity matrix have values of 1.

The rest represents the cosine similarity between two clients. For example, the measure of cosine similarity between clients 12347 and 12348 is 0.063022. On the other hand, the cosine similarity between clients 12347 and 12349 is 0.046130. This suggests that client 12348 is more similar to client 12347 than client 12349 is to client 12347, based on the products they purchased. In this way, we can easily tell which customers are similar to others, and which customers have purchased similar items from others.

These cosine pair similarity measures are what we will use for product recommendations. Let's work on choosing a customer as an example. First we will classify the customers most similar to the customer with ID 12350, using the following code:

These are the 10 most similar clients to the 12350 client. Let's choose client 17935 and discuss how we can recommend products using these results.

The strategy is as follows.

Let's first see how we can retrieve the items that the 12350 customer has purchased in the past. The code looks like this:

As you can see in this code, we are using the nonzero() function. This function returns the integer indexes of the non-zero elements. Using this function in the customer_item_matrix for the given 12350 client, we can get the list of elements that the 12350 client has purchased. We can apply the same code for the target client 17935, as in the following:

We now have two sets of items that customers 12350 and 17935 have purchased. Using a simple set operation, we can find the items that customer 12350 has purchased, but customer 17935 has not. The code is like the one below:

To obtain the descriptions of these items, you can use the following code:

Using user-based collaborative filtering, we have discussed how we can make specific product recommendations for individual customers. You can customize and include in your marketing messages these products that each target customer is likely to buy, which can generate more conversions from your customers.

As discussed so far, using a collaborative user-based filtering algorithm, you can easily make product recommendations for target customers.

However, there is one major disadvantage to using user-based collaborative filtering. As we have seen in this exercise, recommendations are based on the individual customer's purchase history. For new customers, we will not have enough data to compare these new customers with others. To handle this problem, we can use item-based collaborative filtering, which we will discuss now.

Collaborative filtering based on articles and recommendations

Item-based collaborative filtering is similar to the user-based approach, except that it uses measures of similarity between items, rather than between users or customers.

Before, we had to calculate cosine similarities between users, but now, we are going to calculate cosine similarities between items. Take a look at the following code:

If you compare this code with the previous one, in which we calculate an array of similarities between users, the only difference is that here we are transposing the customer_item_matrix, so that the indexes in the rows represent individual items and the columns represent the customers.

We continue to use the cosine_similarity function from the metrics.pairwise module in the sklearn package. To correctly name the indexes and columns with the product codes, you can use the following code:

As before, diagonal elements have values of 1. This is because the similarity between an element and itself is 1, which means that the two are identical. The rest of the elements contain the values for measuring the similarity between the elements based on the calculation of cosine similarity.

For example, if you look at the similarity matrix between elements above, the cosine similarity between the element with StockCode 10002 and the element with StockCode 10120 is 0.094868. On the other hand, the cosine similarity between item 10002 and item 10125 is 0.090351. This suggests that the item with StockCode 10120 is more similar to the item with StockCode 10002 than the item with StockCode 10125 to the item with StockCode 10002.

The strategy for making the product recommendation using this item similarity matrix is similar to the one we did using the user-based approach.

Let's work with an example.

Let's suppose that a new customer has just bought a product with StockCode 23166, and we want to include in our marketing emails some products that this customer is most likely to buy. The first thing we have to do is to find the items most similar to the one with StockCode 23166. You can use the following code to get the 10 items most similar to the item with StockCode 23166:

We can obtain descriptions of these similar items using the following code:

The first item here is the one that the target customer has just bought and the other nine are the items that are frequently bought by others who have bought the first item.

As you can see, those who have purchased ceramic storage jars often buy gelatin molds, spice cans, and cake tins.

With this data, you can include these items in your marketing messages for this target customer as additional product recommendations. Personalizing marketing messages with specific product recommendations often results in higher customer conversion rates. Using an article-based collaborative filtering algorithm, you can now easily make product recommendations for both new and existing customers.