Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Twitter sentiment analysis using Python and NLTK (laurentluce.com)
79 points by ananthrk on March 15, 2012 | hide | past | favorite | 8 comments


The Pattern library has sentiment analysis built-in, pretty fun toolkit to play around with.

http://www.clips.ua.ac.be/pages/pattern-en#sentiment


In Pattern, sentiment analysis is a one-liner:

    >>> from pattern.en import sentiment
    >>> print sentiment(
    >>>     "The movie attempts to be surreal by incorporating various time paradoxes,"
    >>>     "but it's presented in such a ridiculous way it's seriously boring.") 

    (-0.34, 1.0)


Great write-up. My company (Tawlk) actually open sourced a library to automate this very thing. We typically get around 80% accuracy with about 2 million samples.

You can grab our sample set here: https://github.com/downloads/Tawlk/synt/sample_data.bz2

And check out the project here: http://github.com/Tawlk/synt

It also ships with a full CLI interface if you just want to play with it without getting knee deep into the code.

Also if you want to to see a stripped down stand-alone code sample that steps you through the process I made this gist:

https://gist.github.com/1266556

Enjoy :)


A better example is shown by Jacob Perkins on his blog - http://streamhacker.com/2010/05/10/text-classification-senti...


Sounds like what tawlk does. Wonder if their training data/method is better, though.


The method is mostly the same one that is used within our synt library (htto://github.com/Tawlk/synt). We built quite a bit on top of it however. That said, the author did a great job of explaining the process.

Good encouragement for me to better document synt.


What are neutral tweets classified as?


It is a binary classifier so everything is at least slightly negative or slightly positive in a range from -1 to 1.

Think of it like leveler tool used in construction. Nothing is ever _perfectly_ level. It is either tilting one way or the other, but there is an acceptable range people will generally call 'level'. Neutral is the same.

If the classifier rates something something as 0.001 then that is probably safe to call it 'neutral'. It would be up to the application to decide on a 'neutral range'. You could for instance just flag anything between -0.2..0.2 as 'neutral'. It is good to define functions like these last so you can adjust the range manually until you have reduced false positives to a minimum with your particular data set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: