Ben Chuanlong Du's Blog

It is never too late to learn.

Error Handling in Rust

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark Issue: IllegalArgumentException: Wrong FS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

java.lang.IllegalArgumentException: Wrong FS: hdfs://..., expected: viewfs://...

Possible Causes

The Spark cluster has migrated to Router-based Federation (RBF) namenodes, and viewfs:// (instead of hdfs://) is required to access HDFS …

Spark Issue: ViewFs: Cannot Initialize: Empty Mount Table in Config

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config for viewfs://cluster-name-ns02/

Possible Causes

As the error message says, viewfs://cluster-name-ns02 is not configured.

  1. It is possible that …

Spark Issue: ArrowTypeError: Expect a Type but Got a Different Type

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64

Possible Causes

A pandas_udf tag specifies a return type of String but the corresponding pandas udf returns a different …

Spark Issue: Runtimeerror: Arrow Legacy IPC Format Is Not Supported

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT

Possible Causes

You are using PySpark 3.0+ with one (or both) of the following options.

--conf …

Spark Issue: AnalysisException: Found Duplicated Columns

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

pyspark.sql.utils.AnalysisException: Found duplicate column(s) when inserting into ...

Possible Causes

As the error message says, there are duplicated columns in your Spark SQL code.

Possible Solutions

Fix …