Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: High Disk and Memory Spill When Doing Shuffle

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symtom

High disk and memory spill when doing shuffle.

Cause

Insufficient executor memory (you can monitor this spill metrics from Spark UI).

Solution

  1. Increase executor memory.

    --executor-memory=4G
    
  2. For jobs that …

Spark Issue: Table Not Found

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom 1

org.apache.spark.sql.AnalysisException: Table not found

Symptom 2

java.lang.RuntimeException: Table Not Found: my_rdd

Cause 1

Miss-spelled a table name.

Solution 1

Correct miss-spelling.

Cause 2 …

Spark Issue: Too Large Table for Auto BroadcastHashJoin

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Symptom 1

16/04/17 11:17:36 ERROR scheduler.TaskSetManager: Total size of serialized results of 126 tasks (1137.3 MB) is bigger than spark.driver.maxResultSize (1024.0 …

Spark Issue: java.io.FileNotFoundException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

Symptom 1

15/12/10 07:44:21 ERROR shuffle.OneForOneBlockFetcher: Failed while starting block fetches

java.lang.RuntimeException: java.io.FileNotFoundException: /hadoop/1/scratch/local/usercache/dclong/appcache/application_1447357188616_340392 …

Spark Issue: Data Skew on Shuffle Phase

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

org.apache.spark.shuffle.FetchFailedException: Too large frame: 2200180718 Caused by: java.lang.IllegalArgumentException: Too large frame: 2200289525 at org.spark_project.guava.base.Preconditions.checkArgument(Preconditions.java:119)

Reason

There …