Maybe apart from stemming), it's not hard to implement this in ~100 lines withou...

Maybe apart from stemming), it's not hard to implement this in ~100 lines without NLTK.:

- In naive Bayes classification, model parameters can usually be estimated using relative frequencies in the training data.

- WordPunctTokenizer is a very simple tokenizer that makes anything matching \w+ and [^\w\s]+ a separate token.

- Extracting Bigrams from a list of tokens is trivial.

Of course, using NLTK will be very helpful in many situations, but this is hardly a testament to NLTK.