Ben Chuanlong Du's Blog

It is never too late to learn.

Nature Language Processing Using NLTK

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

nltk.util.ngrams nltk.bigrams nltk.PorterStemmer

from nltk.util import ngrams
sentence = 'this is a foo bar sentences and i want to ngramize it'
n = 6
sixgrams = ngrams(sentence.split …

Keywords Extracting from Text

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Word Stemming

  1. existing stemming method such as NLTK.PorterStem, etc.

  2. didn't -> did not, there's -> there is, etc. Mr. -> Mister Mrs. -> ... Ms. -> ...

Other things

  1. it seems that it is hard to get …

Clustering Algorithms in Machine Learning

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Centroid-based Clustering

  • K-means Clustering

  • K-medians Clustering

  • K-mediods Clustering

Hierarchical Clustering

  • Agglomerative Hierarchical Clustering

  • Divisive Hierarchical Clustering

Partional Clustering

Regression Classification ANOVA

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Regression refers to problems where the response (output) variable is continous while classfication refers to problems where the response (output) variable is discrete.

Generally speaking fitting gression to classification problems is …