Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's one more reason:

Word2Vec is based on an approach from Lawrence Berkeley National Lab posted in Bag of Words Meets Bags of Popcorn 3 years ago 2 "Google silently did something revolutionary on Thursday. It open sourced a tool called word2vec, prepackaged deep-learning software designed to understand the relationships between words with no human guidance. Just input a textual data set and let underlying predictive models get to work learning."

“This is a really, really, really big deal,” said Jeremy Howard, president and chief scientist of data-science competition platform Kaggle. “… It’s going to enable whole new classes of products that have never existed before.” https://gigaom.com/2013/08/16/were-on-the-cusp-of-deep-learn...

Spotify seems to be using it now: http://www.slideshare.net/AndySloane/machine-learning-spotif... pg 34

But here's the interesting part:

Lawrence Berkeley National Lab was working on an approach more detailed than word2vec (in terms of how the vectors are structured) since 2005 after reading the bottom of their patent: http://www.google.com/patents/US7987191 The Berkeley Lab method also seems much more exhaustive by using a fibonacci based distance decay for proximity between words such that vectors contain up to thousands of scored and ranked feature attributes beyond the bag-of-words approach. They also use filters to control context of the output. It was also made part of search/knowledge discovery tech that won the 2008 R&D100 award http://newscenter.lbl.gov/news-releases/2008/07/09/berkeley-... & http://www2.lbl.gov/Science-Articles/Archive/sabl/2005/March...

A search company that competed with Google called "seeqpod" was spun out of Berkeley Lab using the tech but was then sued for billions by Steve Jobs https://medium.com/startup-study-group/steve-jobs-made-warne... and a few media companies http://goo.gl/dzwpFq

We might combine these approaches as there seems to be something fairly important happening here in this area. Recommendations and sentiment analysis seem to be driving the bottom lines of companies today including Amazon, Google, Nefflix, Apple et al.

https://www.kaggle.com/c/word2vec-nlp-tutorial/discussion/12...



We don't need w2v precursors from 2005, we got more embeddings that we care to use and we can use random embeddings and train them on project for even better results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: