Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Besides using the
col
function to reference a column, Spark/Scala DataFrame supports using$"col_name"
(based on implicit conversion and must haveimport spark.implicit._
) while PySpark DataFrame support usingdf.col_name
(similar to what you can do with a pandas DataFrame).Spark/Scala PySpark col("col_name") col("col_name") Implicit Conversion $"col_name" X Dot reference X df.col_name -
===
(null safe equality comparison) is supported in Spark/Scala but not available in PySpark.
References
https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/Dataset.html
https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/functions.html
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions