Ben Chuanlong Du's Blog

It is never too late to learn.

Tips on JSON

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Shortcomes of JSON

It is suggested that you avoid using the JSON format! TOML and YAML are better text-based alternatives. If readability is not a concern, a binary serialization format is …

Improve the Performance of Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Plan Your Work

  1. Have a clear idea about what you want to do is very important, especially when you are working on an explorative project. It often saves you time to …

Data Quality

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  • Upper and lower bounds tests and Inter Quartile Range Checks(IQR) and standard deviations

  • Aggregate level checks (after manipulating data, there should still be the ability to explain how the data …