Tips and Traps¶
Watch out for NaNs ..., behave might not what you expect ...
None can be used for otherwise and yield null in DataFrame.
Column alias and postional columns can be used in group by in Spark SQL!!!
Notice the function when
behaves like if-else
.
Case of Column Names in Spark DataFrames
Comments¶
Even though Spark DataFrame/SQL APIs do not distinguish cases of column names, the columns saved into HDFS are case-sensitive!