Ben Chuanlong Du's Blog

It is never too late to learn.

Subtle Differences Among Spark DataFrame and PySpark Dataframe

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. Besides using the col function to reference a column, Spark/Scala DataFrame supports using $"col_name" (based on implicit conversion and must have import spark.implicit._) while PySpark DataFrame support using …

Tips on VirtualVM

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

VisualVM is a great tool for performance profiling of JVM applications.

  1. The application must be LONG RUNNING in order for VirtualVM to profile it.

IntelliJ IDEA

VisualVM Launcher

VirtualVM Executable

/usr …

Tips on Kaggle

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. By default, internet access from a Kaggle notebook/kernel is turned off. You have to manually turn it on from the right-side panel in order to visit access internet …