Ben Chuanlong Du's Blog

It is never too late to learn.

Use LightGBM With Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://github.com/Azure/mmlspark/blob/master/docs/lightgbm.md

MMLSpark seems to be the best option to use train models using LightGBM on a Spark cluster. Note that MMLSpark requires …

Collections in Kotlin

Fold vs Reduce

fold takes an initial value, and the first invocation of the lambda you pass to it will receive that initial value and the first element of the collection as parameters. reduce doesn't take an initial value, but instead starts with the first element of the collection as the accumulator (called sum in the following example).

Aggregate DataFrames in Spark

Aggregation Without Grouping

  1. You can aggregate all values in Columns of a DataFrame. Just use aggregation functions in select without groupBy, which is very similar to SQL syntax.

  2. The aggregation functions all and any are available since Spark 3.0. However, they can be achieved using other aggregation functions such as sum