Read/Write TSV in Spark
Read/Write CSV in Spark
Unit Testing for Spark
Static Analyzer¶
If we get the execuation plan, then it is quite easy to analyze ...
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-lineage.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-dependencies.html
http://hydronitrogen.com/in-the-code-spark-sql-query-planning-and-execution.html
Spark Testing Frameworks/Tools¶
You can use Scala testing frameworks ScalaTest (recommended) and Specs, or you can use frameworks/tools developed based on them for Spark specifically. Various discussions suggests that Spark Testing Base is a good one.
https://www.slideshare.net/SparkSummit/beyond-parallelize-and-collect-by-holden-karau
Spark Unit Testing¶
Hands on the Python module dask
Installation¶
- You have to install the complete version of Dask (using the command
pip3 install dask[complete]) if you need support of extended memory (for handling big data) and schedulers (for performance). The default installation version (pip3 install dask) of Dask does not include those features out-of-box.