Data Science
# What is Recommendation System and How to Work a Recommendation Engine?

Have you ever wondered how the shopping sites we use every day find exactly the products we need and how they recommend them to us? Has the similarity of the ads that are constantly in front of us with our needs ever attracted your attention before? How do movie sites, music apps that make recommendations to us know what we will like? Is all this a magic or is it the power of data analysis? Let's explore this magical world together.

1. What is Recommendation System?

2. What is Content Based Recommendation System?

3. What is Cosine Similarity?

4. Recommendation System in Python

5. What is Jaccard Similarity?

Recommendation system is a set of tasks aiming to bring the most accurate product in front of the most appropriate user by using the preferences of the users and the features of the products. In this article, we will examine content-based recommendation systems, which are sub-areas of the recommendation system.

In this post, we will examine content-based recommendation systems. So what is a content-based recommendation system? These recommendation systems recommend products with similar characteristics to users by examining the characteristics of the products that will be recommended. In content-based systems, the characteristics of the products that the user has previously been interested in and how important these characteristics are determined. Comparisons are made with other products taking into account these characteristics and importance coefficients. As a result of this comparison, a similarity score is obtained for each product. The products with the highest score are recommended to the user. These systems, which are part of our lives and seem like magic, basically adopt such a simple principle.

This process is applied for every content we are in contact with, from the music we listen to to the movies we watch, from the ads we click to the products we review. For example, if the music we've listened to before is 90s pop music, our system will probably recommend another pop music from the 90s. Of course, it would be misleading to make suggestions based on only two characteristics. When creating recommendation systems, multiple features of the content are taken into account. Some of these characteristics affect the result more and some of them affect it less. For example, the name of the singer performing the music and the category of the music will affect the suggestions made at different levels.

So, how do we decide which feature is more important? We can try different methods at this stage. The simplest of these is to manually determine how important the properties will be, in other words, the coefficients of the properties. For example, if we think that the singer's name is more important than the style of the song, we can determine the coefficient of the singer name property to be higher. In this way, we increase the effect of this feature on the result. Of course, this is not a method that always works. If not treated carefully, it will also cause the recommendation system to work incorrectly. Another method is to determine the coefficients according to the content that the user is interested in. For example, if a user has listened to songs in a similar style before and cares more about the style of the song than the person performing the song, the coefficient of the song's style should be larger for that user. The recommendation system should determine what each user cares about when making recommendations to users and shape its recommendation according to the user's preferences. Although this method is more difficult and costly, it gives more satisfactory results.

Another important point in recommendation systems is how to calculate the similarity of the contents. Two of the most popular methods used to calculate the similarity between two data are Cosine Similarity and Jaccard Similarity methods. Let's examine the characteristics of these methods together.

The cosine similarity method is a famous method used to calculate the similarity between two vectors. Cosine similarity can be easily calculated using the following very innocent standing function.

Let's study the formula together. We start the process by making dot product (dot product) of the two vectors whose similarity we are trying to calculate. Then we multiply the lengths of the vectors. Finally, we divide the first value by the second value and calculate the cosine similarity value.

If these statements sound foreign, you need not be afraid. Instead of making individual calculations, we can easily handle this process with a few lines of code. The code below will do this process fondly.

```
def cosine_similarity(vector1, vector2):
import numpy as np
dot_product = np.dot(vector1, vector2)
norm_a = np.linalg.norm(vector1)
norm_b = np.linalg.norm(vector2)
cosine_similarity_value = dot_product / (norm_a * norm_b)
return cosine_similarity_value
```

Let's do an experiment and examine the cosine similarities of the (3,1) and (4,3) vectors in the graph below, which are located close to each other.

Using the function named cosine_similarity on the base, we can find the similarity of these two vectors.

We thought that these two vectors were similar to each other, and we confirmed this with our function. We found that there is a 94.87% similarity between the two vectors. Our function is working properly. If you're starting to think that we were going to study suggestion systems, we're studying vectors, the mathematical part is almost over. So how are we going to use these operations, vectors in suggestion systems? Content-based recommendation systems start here. Let's imagine that we want to create a movie recommendation system and we will make suggestions according to the similarity in the categories. Let's try these operations on categories using cosine similarity.

Categories of movie-1 = Family, Comedy, Action

Categories of movie-2 = Action, Horror, Thriller

When we examine the categories, these two films do not look very similar to each other. Let's see if our function agrees with us?

First of all, we start the process by bringing together all the categories in the two films.

All categories in two movies = Family, Comedy, Action, Horror, Thriller

At the next stage, we are creating vector per movie.

We have brought together all the categories of films. We will look at these categories respectively. We will write the value 1 for the categories belonging to the movie and 0 for those that do not belong to the movie into the vector. As a result of this process, we will have created the vector of the movie.

Vector of movie-1 = [1, 1, 1, 0, 0]

Vector of movie-2 = [0, 0, 1, 1, 1]

The next step after creating the vectors is to calculate the similarity between these vectors. We can calculate the similarity using the above function.

We calculated the cosine similarity value for these two films as 33.3%. We should not recommend the second movie to someone who watched the first movie because there is not a good similarity ratio. Let's make another example.

Categories of movie-1 = Family, Comedy, Action

Categories of movie-2 = Comedy, Action, Fantastic, Family, Action

The categories of movie 1 are the same as movie 1 in the previous example. We will calculate the similarity ratio of the first movie with a new movie. We begin the calculation by combining the categories.

All categories in two movies = Aile, Komedi, Aksiyon, Animasyon, Fantastik

In the next step, we create the vectors of the movies.

Vector of movie-1 = [1, 1, 1, 0, 0]

Vector of movie-2 = [1, 1, 1, 1, 1]

We created vectors of movies. Let's calculate the similarity between movies using our function.

We found that there is a very good similarity between these two films. We calculated the cosine similarity of the two films as 77.5%. Now we can recommend the new movie to the user or to our friend with peace of mind. We achieved very good results even by calculating similarity only over the category. For better results, multiple features should be considered while creating the recommendation system.

Another method used to calculate similarity is the Jaccard Similarity method. This similarity method has a simpler structure. The ratio of the number of common elements in two lists to the total number of elements is known as Jaccard Similarity.

Let's use the movie category similarity example we examined earlier and examine the jaccard similarity formula in detail.

Categories of movie-1 = Family, Comedy, Action

Categories of movie-2 = Action, Horror, Thriller

We calculated the cosine similarity of these two films as 33% in the previous example. Now let's examine the Jaccard similarities of the films.

All categories in two movies = Family, Comedy, Action, Horror, Thriller

Common categories = Action

Count of common categories = 1

Count of all categories = 5

Jaccard similarity = Count of common categories / Count of all categories = 1/5 = 20%

Both calculation methods calculated the similarity rate for these two films very low. We also thought that these two films were not similar. The calculation methods are working very successfully. Let's examine another example.

Categories of movie-1 = Family, Comedy, Action

Categories of movie-2 = Comedy, Animation, Fantastic, Family, Action

These two films have similar categories to each other. We calculated the cosine similarity for these films as 77.5%. Let's calculate the Jaccard similarity as well.

All categories= Family, Comedy, Action, Animation, Fantastic

Common categories= Aile, Komedi, Aksiyon

Count of all categories = 5

Count of common categories = 3

Jaccard similarity = Count of all categories / Count of common categories = 3/5 = 60%

When we calculate the Jaccard similarity of these two films by looking at the categories, we find a 60 percent similarity. Since this is enough and a nice ratio, we can recommend the second movie with peace of mind to a friend who watched the first movie.

Many websites and applications that we use every day use content-based recommendation systems. It analyzes us and the content we prefer and makes suggestions to us. Content-based recommendation systems are mainly based on the above principles. You can also create your own recommendation system and develop a system that provides both fun and useful recommendations.

## avenue17 Nov. 18, 2023, 4:01 a.m.

There is a site, with an information large quantity on a theme interesting you.

## Can Benli June 20, 2023, 9:05 a.m.

Harika anlatım.