Data Profiling Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

ydata-profiling

ydata-profiling (successor to pandas-profiling) is tool for profiling pandas and Spark DataFrames. One possible way to work with large data is to do simple profiling on the large DataFrame and then sample a relative small data and use pandas-profiling to profile it.
great_expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
deequ
Optimus

Optimus is the one that is closest to what I want to achieve so far. Looks promissing.
Apache Griffin

Apache Griffin supports data profiling but seems to be heavy and limited.

Other Adhoc Examples

https://towardsdatascience.com/profiling-big-data-in-distributed-environment-using-spark-a-pyspark-data-primer-for-machine-78c52d0ce45

http://www.bigdatareflections.net/blog/?p=111