Ben Chuanlong Du's Blog

It is never too late to learn.

Read/Write Files/Tables in Spark

References

DataFrameReader APIs

DataFrameWriter APIs

https://spark.apache.org/docs/latest/sql-programming-guide.html#data-sources

Comments

  1. It is suggested that you specify a schema when reading text files. If a schema is not specified when reading text files, it is good practice to check the types of columns (as the types are inferred).

  2. Do NOT read data from and write data to the same path in Spark! Due to lazy evaluation of Spark, the path will likely be cleared before it is read into Spark, which will throw IO exceptions. And the worst part is that your data on HDFS is removed but recoverable.

Tips on Nox

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

It is suggested that you leverage profession CICD tools instead of nox for testing.

https://github.com/theacodes/nox

https://nox.thea.codes/en/stable/index.html

https://cjolowicz.github.io …

Package Management in Linux

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are many different ways to install packages in Linux.

  1. Build From Source

  2. Pre-built Binary

  3. Use distribution specific tools. For example, you can use apt-get or wajig for Debian-based Linux Distributions …