Regularization in Machine Learning Models

Dec 30, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Regularization add a penalty term to the loss function in machine learning models. The type of regularizatin depends on the type of penalty used (not the type of the objective function …

Optimization Method in Machine Learning

Dec 30, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

L-BFGS converges faster and with better solutions on small datasets. However, ADAM is very robust for relatively large datasets. It usually converges quickly and gives pretty good performance. SGD with momentum …

Tips on XGBoost

Dec 28, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

It is suggested that you use the sklearn wrapper classes XGBClassifier and XGBRegressor so that you can fully leverage other tools of the sklearn package.
There are 2 types of boosters …

Libraries for Gradient Boosting

Dec 24, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

XGBoost

https://xgboost.ai/

XGBoost Documentation

Speedup XGBoost

https://machinelearningmastery.com/best-tune-multithreading-support-xgboost-python/

https://medium.com/data-design/xgboost-gpu-performance-on-low-end-gpu-vs-high-end-cpu-a7bc5fcd425b

xgboost GPU is fast. Very fast. As long as it fits in RAM and …

Ensemble Machine Learning Models

Mar 24, 2013

The prediction error is a trade-off of bias and variance. In statistics, we often talk about unbiased estimators (especially in linear regression). In this case we restrict the estimators/predictors to be in a (small) class, and find the optimal solution in this class (called BLUE or BLUP).

Generally speaking …

Tips on Spark MLlib

May 16, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark MLlib RDD-based API supports stratified sampling but the DataFrame-based API hasn't implemented it yet as of Spark 2.4.3.

sample keys (not rows) with equal probability

References

https://spark …

← Older Newer →

Ben Chuanlong Du's Blog

It is never too late to learn.

Regularization in Machine Learning Models

Optimization Method in Machine Learning

Tips on XGBoost

Libraries for Gradient Boosting

XGBoost

Speedup XGBoost

Ensemble Machine Learning Models

Tips on Spark MLlib

References