Using Temporary Columns in Spark

May 21, 2020

Use Spark With Apache Toree Kernel in Juptyerlab

Mar 23, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The Docker image dclong/jupyterhub-toree has Spark and Apache Toree installed and configured. Since Spark is already installed in it, you don't need to download and install Spark by yourself. By …

Use Spark with the Almond Scala Kernel in JupyterLab

Mar 22, 2020

This notebook present a minimum example on how to use Spark with the Almond Scala kernel in Jupyterlab. Notice that Spark 2.4.2 is used since it is the only stable Spark version that supports Scala 2.12 as of now. Please refer to almond-sh/examples for more examples.

Use XGBoost With Spark

Dec 17, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The split-by-leaf mode (grow_policy="lossguide") is not supported in distributed training, which makes XGBoost4J on Spark much slower than LightGBM on Spark.

XGBoost with Spark

https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d

https://xgboost …

Use LightGBM With Spark

Dec 05, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://github.com/Azure/mmlspark/blob/master/docs/lightgbm.md

MMLSpark seems to be the best option to use train models using LightGBM on a Spark cluster. Note that MMLSpark requires …

Build Spark from Source

Feb 20, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

You can download prebuilt binary Spark at https://spark.apache.org/downloads.html. This is where you should get started and it will likely satisfy your need most of the time …

← Older Newer →

Ben Chuanlong Du's Blog

It is never too late to learn.

Using Temporary Columns in Spark

Use Spark With Apache Toree Kernel in Juptyerlab

Use Spark with the Almond Scala Kernel in JupyterLab

Use XGBoost With Spark

XGBoost with Spark

Use LightGBM With Spark

Build Spark from Source