Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Could Not Execute Broadcast in 300S

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: org.apache.spark.SparkException: Could not execute broadcast in 600 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting …

Spark Issue: Pure Python Code Errors

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

This post collects some typical pure Python errors in PySpark applications.

Symptom 1

object has no attribute

Solution 1

Fix the attribute name.

Symptom 2

No such file or directory

Solution …

Spark SQL

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark SQL Guide

  1. Since a Spark DataFrame is immutable, you cannot update or delete records from a physical table (e.g., a Hive table) directly using Spark DataFrame/SQL API. However …

Spark Issue: TypeError WithReplacement

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got [].

Causes

An integer number (e.g., 1) is passed to the fraction parameter …

Process Big Data Using Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. Please refer to Spark SQL for tips specific to Spark SQL.

  2. It is almost always a good idea to filter out null value in the joinining columns before joining …

Spark Issue: InvalidResourceRequestException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=16 …