Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Runtimeerror: Arrow Legacy IPC Format Is Not Supported

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

RuntimeError: Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT

Possible Causes

You are using PySpark 3.0+ with one (or both) of the following options.

--conf …

Spark Issue: AnalysisException: Found Duplicated Columns

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

pyspark.sql.utils.AnalysisException: Found duplicate column(s) when inserting into ...

Possible Causes

As the error message says, there are duplicated columns in your Spark SQL code.

Possible Solutions

Fix …

Spark Issue: GetQuotaUsage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom I

py4j.protocol.Py4JJavaError: An error occurred while calling o156.getQuotaUsage.

Symptom II

org.apache.hadoop.ipc.RemoteException(java.io.IOException): The quota system is disabled in Router.

Possible Causes …

Spark Issue: Pure Python Code Errors

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

This post collects some typical pure Python errors in PySpark applications.

Symptom 1

object has no attribute

Solution 1

Fix the attribute name.

Symptom 2

No such file or directory

Solution …

Fix the CrashLoopBackOff Issue of Pod in Kubernetes

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Define command as ["/busybox/sh", "-c", "tail -f /dev/null"] instead of ["/busybox/sh", "-c", "tail", "-f", "/dev/null"]