Twitter sentiment analysis using Python and NLTK

detour · on March 15, 2012

The Pattern library has sentiment analysis built-in, pretty fun toolkit to play around with.

http://www.clips.ua.ac.be/pages/pattern-en#sentiment

fdb · on March 15, 2012

In Pattern, sentiment analysis is a one-liner:

    >>> from pattern.en import sentiment
    >>> print sentiment(
    >>>     "The movie attempts to be surreal by incorporating various time paradoxes,"
    >>>     "but it's presented in such a ridiculous way it's seriously boring.") 

    (-0.34, 1.0)

lrvick · on March 16, 2012

Great write-up. My company (Tawlk) actually open sourced a library to automate this very thing. We typically get around 80% accuracy with about 2 million samples.

You can grab our sample set here: https://github.com/downloads/Tawlk/synt/sample_data.bz2

And check out the project here: http://github.com/Tawlk/synt

It also ships with a full CLI interface if you just want to play with it without getting knee deep into the code.

Also if you want to to see a stripped down stand-alone code sample that steps you through the process I made this gist:

https://gist.github.com/1266556

Enjoy :)

tchalla · on March 15, 2012

A better example is shown by Jacob Perkins on his blog - http://streamhacker.com/2010/05/10/text-classification-senti...

abyssknight · on March 15, 2012

Sounds like what tawlk does. Wonder if their training data/method is better, though.

lrvick · on March 16, 2012

The method is mostly the same one that is used within our synt library (htto://github.com/Tawlk/synt). We built quite a bit on top of it however. That said, the author did a great job of explaining the process.

Good encouragement for me to better document synt.

jasonkolb · on March 15, 2012

What are neutral tweets classified as?

lrvick · on March 16, 2012

It is a binary classifier so everything is at least slightly negative or slightly positive in a range from -1 to 1.

Think of it like leveler tool used in construction. Nothing is ever _perfectly_ level. It is either tilting one way or the other, but there is an acceptable range people will generally call 'level'. Neutral is the same.

If the classifier rates something something as 0.001 then that is probably safe to call it 'neutral'. It would be up to the application to decide on a 'neutral range'. You could for instance just flag anything between -0.2..0.2 as 'neutral'. It is good to define functions like these last so you can adjust the range manually until you have reduced false positives to a minimum with your particular data set.