Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Word2Vec
https://code.google.com/archive/p/word2vec/
Hierarchical Softmasx
Negative Sampling
Google Word2Vec claims that hierarchical softmax is better for infrequent words while negative sampling is better for frequent words and better with low dimensional vectors.
http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/
https://stackoverflow.com/questions/27860652/word2vec-negative-sampling-in-layman-term
https://towardsdatascience.com/hierarchical-softmax-and-negative-sampling-short-notes-worth-telling-2672010dbe08
https://www.quora.com/What-is-negative-sampling
https://stats.stackexchange.com/questions/180076/why-is-hierarchical-softmax-better-for-infrequent-words-while-negative-sampling
Examples
https://blog.floydhub.com/automate-customer-support-part-one/