Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Spark Issue: IllegalArgumentException: Wrong FS
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
java.lang.IllegalArgumentException: Wrong FS: hdfs://..., expected: viewfs://...
Possible Causes
The Spark cluster has migrated to Router-based Federation (RBF) namenodes,
and viewfs://
(instead of hdfs://
) is required to access HDFS …
Spark Issue: ViewFs: Cannot Initialize: Empty Mount Table in Config
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config for viewfs://cluster-name-ns02/
Possible Causes
As the error message says,
viewfs://cluster-name-ns02
is not configured.
-
It is possible that …
Spark Issue: ArrowTypeError: Expect a Type but Got a Different Type
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptom
pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
Possible Causes
A pandas_udf
tag specifies a return type of String
but the corresponding pandas udf returns a different …
Spark Issue: Runtimeerror: Arrow Legacy IPC Format Is Not Supported
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT
Possible Causes
You are using PySpark 3.0+ with one (or both) of the following options.
--conf …
Spark Issue: AnalysisException: Found Duplicated Columns
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
pyspark.sql.utils.AnalysisException: Found duplicate column(s) when inserting into ...
Possible Causes
As the error message says, there are duplicated columns in your Spark SQL code.
Possible Solutions
Fix …