Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Top 100 words in News.YC titles (ycombinator.com)
29 points by pg on Jan 9, 2008 | hide | past | favorite | 20 comments


I presume you're implementing search?


Good guess.


Take a look at SOLR: http://lucene.apache.org/solr/ that makes indexing very easy


Dr. Seuss wrote 'Green Eggs and Ham' with only 50 words. Can these 100 be strung together (allowing repetition) into something remotely meaningful and grammatical?


It's hard without connecting words. I was able to come up with a few headlines though:

- Google launches first big company platform using python where people make money over hacker life.

- Lisp application time better vs. javascript, python, ruby.

- Startup entrepreneur launches better open source ruby tech blog.

*edit, I just now saw the other headlines by readers, but I don't feel like submitting these as actual stories.


Interesting. I wonder what the results would look like if the popularity (in points) of the submissions were incorporated.


So now that we have the most popular keywords, who can make a title with the most keywords in it?

http://news.ycombinator.com/item?id=96660



nice contest, here is mine :P

http://news.ycombinator.com/item?id=96663


I like the end: "problem platform. next website need computer better."


Note to self: like best hacker make way good (10) ruby application time future


Is that listed by frequency? If news.yc existed in 2000 I wonder what the list would look like.


Yes; I changed the page to clarify. Though in fact it would be hard to generate such a list without it being in order of frequency.


Heh, this is algorithm bait.

The obvious algorithm is to sort all the words in order of increasing frequency, then print the 100 first. This algorithm is O(n log n) because of the sorting. It generates the list in order of frequency.

There is an O(n) algorithm that first picks some random item as pivot, then does a partioning like QuickSort where items with higher frequency than the pivot are moved to the front of the array and those with lower frequency to the back.

If the first partition has more than a 100 items, then the algorithm only has to recurse into that part. If it has fewer (k), then it prints everything in the first partition, and recurses into the second to generate the 100 - k best items.

This is expected O(n) and will not generate the list in order of frequency.


A fun, informative feature could be that list for each user in their profile. Even better if it updates based on use - how the hivemind splinters into the specific bees. There's so much that could be done there - similar users, related posts, discovery. I'm surprised we have yet to see that basic functionality in the more nebulous news sites.


This is the only crowd where "money" comes almost last.


Granted its ambiguous, but million closer to the top than money. Its also easier to say :-).


Check back in a few years ;)


There is YC, thought I'd find PG also.


(Apparently someone forgot to add 'remove #\|').




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: