Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: High Disk and Memory Spill When Doing Shuffle

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symtom

High disk and memory spill when doing shuffle.

Cause

Insufficient executor memory (you can monitor this spill metrics from Spark UI).

Solution

  1. Increase executor memory.

    --executor-memory=4G
    
  2. For jobs that …

Spark Issue: Container Killed by Yarn for Exceeding Memory Limits

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Symptom 1

Container killed by YARN for exceeding memory limits.
22.0 GB of 19 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled …

Hands on the Resource Module in Python

Tips and Traps

This module provides basic mechanisms for measuring and controlling system resources utilized by a process and its subprocesses. It cannot be used to check resource usage of other processes.

Sort top by CPU or Memory Usage

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

By default the result of the top command is sorted by CPU usage on Linux. The table below list options to sort the result of the top command by different criterias …