Use Spark With Apache Toree Kernel in Juptyerlab
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The Docker image dclong/jupyterhub-toree has Spark and Apache Toree installed and configured. Since Spark is already installed in it, you don't need to download and install Spark by yourself. By …
Use Spark with the Almond Scala Kernel in JupyterLab
This notebook present a minimum example on how to use Spark with the Almond Scala kernel in Jupyterlab. Notice that Spark 2.4.2 is used since it is the only stable Spark version that supports Scala 2.12 as of now. Please refer to almond-sh/examples for more examples.
Use XGBoost With Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The split-by-leaf mode (grow_policy="lossguide"
) is not supported in distributed training,
which makes XGBoost4J on Spark much slower than LightGBM on Spark.
XGBoost with Spark
https://towardsdatascience.com/build-xgboost-lightgbm-models-on-large-datasets-what-are-the-possible-solutions-bf882da2c27d
https://xgboost …
Use LightGBM With Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
https://github.com/Azure/mmlspark/blob/master/docs/lightgbm.md
MMLSpark seems to be the best option to use train models using LightGBM on a Spark cluster. Note that MMLSpark requires …
Build Spark from Source
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
You can download prebuilt binary Spark at https://spark.apache.org/downloads.html. This is where you should get started and it will likely satisfy your need most of the time …