Ben Chuanlong Du's Blog

It is never too late to learn.

Data Profiling Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. ydata-profiling

    ydata-profiling (successor to pandas-profiling) is tool for profiling pandas and Spark DataFrames. One possible way to work with large data is to do simple profiling on the large DataFrame and …

Data Quality

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  • Upper and lower bounds tests and Inter Quartile Range Checks(IQR) and standard deviations

  • Aggregate level checks (after manipulating data, there should still be the ability to explain how the data …