Collaborative Filtering-Based Recommender

2 minute read

Machine Learning Based Recommendation Systems

SVD Matrix Factorization

import numpy as np
import pandas as pd

import sklearn
from sklearn.decomposition import TruncatedSVD

The MovieLens dataset was collected by the GroupLens Research Project at the University of Minnesota. You can download the dataset for this demostration at the following URL: https://grouplens.org/datasets/movielens/100k/

Preparing the data

columns = ['user_id', 'item_id', 'rating', 'timestamp']
frame = pd.read_csv('ml-100k/u.data', sep='\t', names=columns)
frame.head()

	user_id	item_id	rating	timestamp
0	196	242	3	881250949
1	186	302	3	891717742
2	22	377	1	878887116
3	244	51	2	880606923
4	166	346	1	886397596

columns = ['item_id', 'movie title', 'release date', 'video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
          'Animation', 'Childrens', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy', 'Film-Noir', 'Horror',
          'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

movies = pd.read_csv('ml-100k/u.item', sep='|', names=columns, encoding='latin-1')
movie_names = movies[['item_id', 'movie title']]
movie_names.head()

	item_id	movie title
0	1	Toy Story (1995)
1	2	GoldenEye (1995)
2	3	Four Rooms (1995)
3	4	Get Shorty (1995)
4	5	Copycat (1995)

combined_movies_data = pd.merge(frame, movie_names, on='item_id')
combined_movies_data.head()

	user_id	item_id	rating	timestamp	movie title
0	196	242	3	881250949	Kolya (1996)
1	63	242	3	875747190	Kolya (1996)
2	226	242	5	883888671	Kolya (1996)
3	154	242	3	879138235	Kolya (1996)
4	306	242	5	876503793	Kolya (1996)

combined_movies_data.groupby('item_id')['rating'].count().sort_values(ascending=False).head()

item_id
   583
  509
  508
  507
  485
Name: rating, dtype: int64

filter = combined_movies_data['item_id']==50
combined_movies_data[filter]['movie title'].unique()

array(['Star Wars (1977)'], dtype=object)

Building a Utility Matrix

rating_crosstab = combined_movies_data.pivot_table(values='rating', index='user_id', columns='movie title', fill_value=0)
rating_crosstab.head()

movie title	'Til There Was You (1997)	1-900 (1994)	101 Dalmatians (1996)	12 Angry Men (1957)	187 (1997)	2 Days in the Valley (1996)	20,000 Leagues Under the Sea (1954)	2001: A Space Odyssey (1968)	3 Ninjas: High Noon At Mega Mountain (1998)	39 Steps, The (1935)	...	Yankee Zulu (1994)	Year of the Horse (1997)	You So Crazy (1994)	Young Frankenstein (1974)	Young Guns (1988)	Young Guns II (1990)	Young Poisoner's Handbook, The (1995)	Zeus and Roxanne (1997)	unknown	Á köldum klaka (Cold Fever) (1994)
user_id
1	0	0	2	5	0	0	3	4	0	0	...	0	0	0	5	3	0	0	0	4	0
2	0	0	0	0	0	0	0	0	1	0	...	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	2	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
5	0	0	2	0	0	0	0	4	0	0	...	0	0	0	4	0	0	0	0	4	0

5 rows × 1664 columns

Transposing the Matrix

rating_crosstab.shape

(943, 1664)

X = rating_crosstab.T
X.shape

(1664, 943)

Decomposing the Matrix

SVD = TruncatedSVD(n_components=12, random_state=17)

resultant_matrix = SVD.fit_transform(X)

resultant_matrix.shape

(1664, 12)

Generating a Correlation Matrix

corr_mat = np.corrcoef(resultant_matrix)
corr_mat.shape

(1664, 1664)

Isolating Star Wars From the Correlation Matrix

movie_names = rating_crosstab.columns
movies_list = list(movie_names)

star_wars = movies_list.index('Star Wars (1977)')
star_wars

corr_star_wars = corr_mat[1398]
corr_star_wars.shape

(1664,)

Recommending a Highly Correlated Movie

list(movie_names[(corr_star_wars<1.0) & (corr_star_wars > 0.9)])

['Die Hard (1988)',
 'Empire Strikes Back, The (1980)',
 'Fugitive, The (1993)',
 'Raiders of the Lost Ark (1981)',
 'Return of the Jedi (1983)',
 'Terminator 2: Judgment Day (1991)',
 'Terminator, The (1984)',
 'Toy Story (1995)']

list(movie_names[(corr_star_wars<1.0) & (corr_star_wars > 0.95)])

['Return of the Jedi (1983)']

Share on

Twitter Facebook Google+ LinkedIn

Nishkarsh Jain

Collaborative Filtering-Based Recommender

Machine Learning Based Recommendation Systems

SVD Matrix Factorization

Preparing the data

Building a Utility Matrix

Transposing the Matrix

Decomposing the Matrix

Generating a Correlation Matrix

Isolating Star Wars From the Correlation Matrix

Recommending a Highly Correlated Movie

Share on

You May Also Enjoy

Titanic: Machine Learning from Disaster

Analyze Your Runkeeper Fitness Data

Content-Based Recommender

Popularity-Based Recommender : PearsonR Correlation