If I recall, BellKor made many, many models based on Gradient Boosted Decision Trees,
Restricted Boltzmann Machines, and kNN. They tried many different feature subsets, added temporal weighting, and tried many reduced-dimensionality representations (SVD, NMF). They then stacked them all together into one final ensemble whose RSME beat everyone else's on a hidden validation set.
In a production environment, this is probably an insane amount of transformation, feature extraction, and classification for marginally little gains in precision (as defined here). But I'm only a year or two in to building production-environment classifiers, and nothing at Netflix's scale (though not tiny either-- it is a problem if I can't do feature extraction and high-precision/-recall* classification within a few milliseconds).
It's funny to me that people are so quick to poo-poo the complicated modeling done for the Netflix prize. When did production-worthiness become the only important thing? It's like saying Watson was useless because it can't play a concurrent game of Jeopardy with thousands of people on the web.
Just like in research, it turns out that relaxing real-world constraints on a problem is often a great way to make progress. I would not have to search long or far to provide much worse uses of million-dollar grants/projects/big-data-software.
Don't mean to poo-poo. I love scrambling for models and methods. In the latest problem I worked on, I did the same thing. Implementing papers, trying to avail myself of semi-supervised or reduced-dimension representations, tweaking models and features every which way... it's illuminating work, and good things come out of it.
But, in the end, companies like Netflix or where I work are immediately looking for cheap ways to make X happen easier, better, and cheaper. But then hopefully the smart papers go on a shelf or are easily Googleable, and the rest of us get to learn from their efforts.
I can fit long-term and short-term goals in my brain, too, mister.
In a production environment, this is probably an insane amount of transformation, feature extraction, and classification for marginally little gains in precision (as defined here). But I'm only a year or two in to building production-environment classifiers, and nothing at Netflix's scale (though not tiny either-- it is a problem if I can't do feature extraction and high-precision/-recall* classification within a few milliseconds).
* - mid 90s, for a hard NLP/social graph problem.