Ben Chuanlong Du's Blog

It is never too late to learn.

Build Spark from Source

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

You can download prebuilt binary Spark at https://spark.apache.org/downloads.html. This is where you should get started and it will likely satisfy your need most of the time …

Subtle Differences Among Spark DataFrame and PySpark Dataframe

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Besides using the col function to reference a column, Spark/Scala DataFrame supports using $"col_name" (based on implicit conversion and must have import spark.implicit._) while PySpark DataFrame support using …

Tips on VirtualVM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

VisualVM is a great tool for performance profiling of JVM applications.

  1. The application must be LONG RUNNING in order for VirtualVM to profile it.

IntelliJ IDEA

VisualVM Launcher

VirtualVM Executable

/usr …