Recommendation system

> > >

Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge. You probably recall all the excitement that went around when a group finally won the big Netflix $1 million prize in 2009, improving Netflix's recommendation algorithm by 10%.

But what you might not know, is that Netflix never implemented that solution itself. Netflix recently put up a blog post discussing some of the details of its recommendation system, which (as an aside) explains why the winning entry never was used. First, they note that they did make use of an earlier bit of code that came out of the contest: A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. Neat. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Netflix Recommendations: Beyond the 5 stars (Part 1) By Xavier Amatriain and Justin Basilico (Personalization Science and Engineering) In this two-part blog post, we will open the doors of one of the most valued Netflix assets: our recommendation system.

Netflix Recommendations: Beyond the 5 stars (Part 1)

In Part 1, we will relate the Netflix Prize to the broader recommendation challenge, outline the external components of our personalized service, and highlight how our task has evolved with the business. In Part 2, we will describe some of the data and models that we use and discuss our approach to algorithmic innovation that combines offline machine learning experimentation with online AB testing. Enjoy… and remember that we are always looking for more star talent to add to our great team, so please take a look at our jobs page. In 2006 we announced the Netflix Prize, a machine learning and data mining competition for movie rating prediction. A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. Using your laptop to compute PageRank for millions of webpages. The PageRank algorithm is a great way of using collective intelligence to determine the importance of a webpage.

Using your laptop to compute PageRank for millions of webpages

There’s a big problem, though, which is that PageRank is difficult to apply to the web as a whole, simply because the web contains so many webpages. While just a few lines of code can be used to implement PageRank on collections of a few thousand webpages, it’s trickier to compute PageRank for larger sets of pages. A Graph-Based Movie Recommender Engine « Marko A. Rodriguez. The MovieRatings Dataset The GroupLens research group has made available a corpus of movie ratings.

A Graph-Based Movie Recommender Engine « Marko A. Rodriguez

There are 3 versions of this dataset: 100 thousand, 1 million, and 10 million ratings. This post makes use of the 1 million ratings version of the dataset. The dataset can be downloaded from the MovieRatings website (~6 megs in size). The raw dataset is composed of three files: users.dat, movies.dat, and ratings.dat. Getting Started with Gremlin All of the code examples can be cut and pasted into the Gremlin console or into a Groovy/Java class within a larger application.

Generating a MovieRatings Graph Before getting recommendations of which movies to watch, it is important to first parse the raw MovieLens data according to the graph schema defined above. What is a Good Recommendation Algorithm? By Greg Linden March 24, 2009 Comments (10) Someone may win the one million dollar Netflix Prize soon.

Will the winning algorithm produce movie recommendations that people like? Netflix is offering one million dollars for a better recommendation engine. Better recommendations clearly are worth a lot. But what are better recommendations? In the Netflix Prize, the meaning of better is quite specific. Let's say we build a recommender that wins the contest. Depending on what we want, it might be very good. However, this might not be what we want. Moreover, what we often want is not to make a prediction for any movie, but find the best movies. A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies.

There are parallels here with web search. Aggravating matters further, in both recommender systems and web search, people's perception of quality is easily influenced by factors other than the items shown. Andrei Lopatenko Ian Soboroff. 探索推荐引擎内部的秘密. “探索推荐引擎内部的秘密”系列将带领读者从浅入深的学习探索推荐引擎的机制，实现方法，其中还涉及一些基本的优化方法，例如聚类和分类的应用。

同时在理论讲解的基础上，还会结合 Apache Mahout 介绍如何在大规模数据上实现各种推荐策略，进行策略优化，构建高效的推荐引擎的方法。本文作为这个系列的第一篇文章，将深入介绍推荐引擎的工作原理，和其中涉及的各种推荐机制，以及它们各自的优缺点和适用场景，帮助用户清楚的了解和快速构建适合自己的推荐引擎。信息发现如今已经进入了一个数据爆炸的时代，随着 Web 2.0 的发展， Web 已经变成数据分享的平台，那么，如何让人们在海量的数据中想要找到他们需要的信息将变得越来越难。在这样的情形下，搜索引擎（Google，Bing，百度等等）成为大家快速找到目标信息的最好途径。随着推荐引擎的出现，用户获取信息的方式从简单的目标明确的数据的搜索转换到更高级更符合人们使用习惯的信息发现。如今，随着推荐技术的不断发展，推荐引擎已经在电子商务 (E-commerce，例如 Amazon，当当网 ) 和一些基于 social 的社会化站点 ( 包括音乐，电影和图书分享，例如豆瓣，Mtime 等 ) 都取得很大的成功。回页首推荐引擎前面介绍了推荐引擎对于现在的 Web2.0 站点的重要意义，这一章我们将讲讲推荐引擎到底是怎么工作的。

图 1. 图 1 给出了推荐引擎的工作原理图，这里先将推荐引擎看作黑盒，它接受的输入是推荐的数据源，一般情况下，推荐引擎所需要的数据源包括：要推荐物品或内容的元数据，例如关键字，基因描述等；系统用户的基本信息，例如性别，年龄等用户对物品或者信息的偏好，根据应用本身的不同，可能包括用户对物品的评分，用户查看物品的记录，用户的购买记录等。