Ben Chuanlong Du's Blog

It is never too late to learn.

Use XGBoost With Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The split-by-leaf mode (grow_policy="lossguide") is not supported in distributed training, which makes XGBoost4J on Spark much slower than LightGBM on Spark.

XGBoost with Spark

https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d

https://xgboost …

Subword Algorithms for NLP

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Classic word representation cannot handle unseen word or rare word well. Character embeddings is one of the solution to overcome out-of-vocabulary (OOV). However, it may be too fine-grained and miss some …

Terminologies and Concepts in NLP

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Word Embedding Character Embedding Subword Embeddling Tokenization

General Language Understanding Evaluation (GLUE)

Natural Language Generation (NLG) Natural Language Generation, as defined by Artificial Intelligence: Natural Language Processing Fundamentals, is the “process …