Comments¶
Even though Spark DataFrame/SQL APIs do not distinguish cases of column names, the columns saved into HDFS are case-sensitive!
Read/Write Parquet Files in Spark
Read/Write TSV in Spark
Read/Write CSV in Spark
Unit Testing for Spark
Static Analyzer¶
If we get the execuation plan, then it is quite easy to analyze ...
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-lineage.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-dependencies.html
http://hydronitrogen.com/in-the-code-spark-sql-query-planning-and-execution.html
Spark Testing Frameworks/Tools¶
You can use Scala testing frameworks ScalaTest (recommended) and Specs, or you can use frameworks/tools developed based on them for Spark specifically. Various discussions suggests that Spark Testing Base is a good one.
https://www.slideshare.net/SparkSummit/beyond-parallelize-and-collect-by-holden-karau
Spark Unit Testing¶
Spark vs Redshift
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement! Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
https://www.quora.com/Spark-vs-Redshift-Should-I-be-using-both-for-big-data-Which-is-better
Performance
https://dbseer.com/benchmark-comparison-spark-sql-redshift-cluster/