Ben Chuanlong Du's Blog

It is never too late to learn.

Activation Functions in Neural Network

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

GELU

GELU is the best activation function currently (at least in NLP).

$$ GELU(x) == x \Phi(x) $$

,

where \(\Phi(x)\) is the cumulative distribution function of the standard normal distribution.

ReLU …

Tips on Transformer in NLP

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

http://nlp.seas.harvard.edu/2018/04/03/attention.html

https://blog.floydhub.com/the-transformer-in-pytorch/

http://jalammar.github.io/illustrated-transformer/

https://towardsdatascience.com/transformers-141e32e69591

Understand Attention in NLP

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

https://medium.com/@joealato/attention-in-nlp-734c6fa9d983

Tips on Word2Vec

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Word2Vec

https://code.google.com/archive/p/word2vec/

Hierarchical Softmasx

Negative Sampling

Google Word2Vec claims that hierarchical softmax is better for infrequent words while negative sampling is better for frequent words …

Compresion of Deep Learning Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

MobileNet

一、网络修剪

网络修剪,采用当网络权重非 …