Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: High Disk and Memory Spill When Doing Shuffle

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symtom

High disk and memory spill when doing shuffle.

Cause

Insufficient executor memory (you can monitor this spill metrics from Spark UI).

Solution

  1. Increase executor memory.

    --executor-memory=4G
    
  2. For jobs that …

Spark Issue: Table Not Found

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom 1

org.apache.spark.sql.AnalysisException: Table not found

Symptom 2

java.lang.RuntimeException: Table Not Found: my_rdd

Cause 1

Miss-spelled a table name.

Solution 1

Correct miss-spelling.

Cause 2 …

Spark Issue: AnalysisException: Cannot Resolve

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

org.apache.spark.sql.AnalysisException: cannot resolve ...

Cause

Miss-spell a column name or refer to a column which does not exist in the DataFrame.

Solution

Correct the column name or …

Spark Issue: AnalysisException: Path Does Not Exist

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

org.apache.spark.sql.AnalysisException: Path does not exist ...

Cause

A specified HDFS path does not exist.

Solution

Use the correct HDFS path.