Ben Chuanlong Du's Blog

It is never too late to learn.

Build Spark from Source

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

You can download prebuilt binary Spark at https://spark.apache.org/downloads.html. This is where you should get started and it will likely satisfy your need most of the time …

Subtle Differences Among Spark DataFrame and PySpark Dataframe

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Besides using the col function to reference a column, Spark/Scala DataFrame supports using $"col_name" (based on implicit conversion and must have import spark.implicit._) while PySpark DataFrame support using …

Sort top by CPU or Memory Usage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

By default the result of the top command is sorted by CPU usage on Linux. The table below list options to sort the result of the top command by different criterias …