Ben Chuanlong Du's Blog

It is never too late to learn.

Unit Testing for Spark

Spark Testing Frameworks/Tools

You can use Scala testing frameworks ScalaTest (recommended) and Specs, or you can use frameworks/tools developed based on them for Spark specifically. Various discussions suggests that Spark Testing Base is a good one.

https://www.slideshare.net/SparkSummit/beyond-parallelize-and-collect-by-holden-karau

Spark Unit Testing

Hands on the Python module dask

Installation

  1. You have to install the complete version of Dask (using the command pip3 install dask[complete]) if you need support of extended memory (for handling big data) and schedulers (for performance). The default installation version (pip3 install dask) of Dask does not include those features out-of-box.