The Source Class in Scala IO

Dec 19, 2019

Row-based Mapping and Filtering on DataFrames in Spark

Dec 13, 2019

Comments¶

Spark DataFrame is an alias to Dataset[Row]. Even though a Spark DataFrame is stored as Rows in a Dataset, built-in operations/functions (in org.apache.spark.sql.functions) for Spark DataFrame are Column-based. Sometimes, there might be transformations on a DataFrame that is hard to express as Column expressions but rather evey convenient to express as Row expressions. The traditional way to resolve this issue is to wrap the row-based function into a UDF. It is worthing knowing that Spark DataFrame supports map/flatMap APIs which works on Rows. They are still experimental as Spark 2.4.3. It is suggested that you stick to Column-based operations/functions until the Row-based methods mature.

Row Object in Spark

Dec 13, 2019

Case of Column Names in Spark DataFrames

Dec 03, 2019

Comments¶

Even though Spark DataFrame/SQL APIs do not distinguish cases of column names, the columns saved into HDFS are case-sensitive!

Use Kotlin in a Scala Project

Nov 27, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Methods of a Kotlin object can be called in a Scala project by KotlinObject.INSTANCE.methodToCall()
You might need to provide the Kotlin standard library kotlin-stdlib.jar in order to run …