background preloader


Facebook Twitter


How We Made GitHub Fast - GitHub. Ryan Cox - AsciiArmor - Lessons Learned from the GitHub Recommen. I spent a few evenings last week working on a contest that GitHub is running to create a recommender engine for their site.

Ryan Cox - AsciiArmor - Lessons Learned from the GitHub Recommen

Think Netflix Prize but much smaller scale. Their description: The 2009 GitHub contest is a simple recommendation engine contest. We have released a dataset of our watched repositories data and want to provide a list of recommended repositories for each of our users. Removed from the sample dataset are 4,788 watches - write an open source program to guess the highest percentage of those removed watches and you win our prize. My objective wasn’t to place first, but to understand the problem more deeply. A few people asked for my notes, so here they are… Lesson 1: Simple is Best I started off my preparation for the contest by reading a few papers on recommender systems that seemed relevant.

It turns out this method wasn’t very effective. R Code for above chart My entry, including code, can be found here. Lesson 2: Collaborate Lesson 3: Have a method. Probability distribution relationships. Probability distributions have a surprising number inter-connections.

probability distribution relationships

A dashed line in the chart below indicates an approximate (limit) relationship between two distribution families. A solid line indicates an exact relationship: special case, sum, or transformation. Click on a distribution for the parameterization of that distribution. Click on an arrow for details on the relationship represented by the arrow. Follow @ProbFact on Twitter to get one probability fact per day, such as the relationships on this diagram. More mathematical diagrams The chart above is adapted from the chart originally published by Lawrence Leemis in 1986 (Relationships Among Common Univariate Distributions, American Statistician 40:143-146.) Parameterizations The precise relationships between distributions depend on parameterization.

Let C(n, k) denote the binomial coefficient(n, k) and B(a, b) = Γ(a) Γ(b) / Γ(a + b). Geometric: f(x) = p (1-p)x for non-negative integers x. Poisson: f(x) = exp(-λ) λx/ x! Jdc. The Fork Queue — GitHub. Chris Wanstrath on GitHub. 3.

Chris Wanstrath on GitHub

You can tell that by what's public and what's private right? Yes. There is a very strong line between the two. So you can host public Open source code or private commercial code on the same account if you want and we try to make it so there's never any confusion about which is which. So it's nice because you can have the same account that you are using during the week doing your work on and if you ever want to experiment something or play with some Open source that you might even pull into your company's code you can do that from the same place. 4. It used to be the playground for Rubyists and Ruby is still definitely the strongest community on the site. 5. It's an interesting analogy. 6. Forking used to be a dirty word. Even with older tools like subversion you do that, you get an Open source piece of code, you modify it but you never shared that with the outside world and maybe you even forget what it was you did. 7.

Absolutely. 8. 9. That is mostly Tom Preston-Werner innovation.