Spark Issue: Could Not Execute Broadcast in 300S

Jan 22, 2022

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: org.apache.spark.SparkException: Could not execute broadcast in 600 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting …

Spark Issue: Pure Python Code Errors

Mar 22, 2021

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

This post collects some typical pure Python errors in PySpark applications.

Symptom 1

object has no attribute

Solution 1

Fix the attribute name.

Symptom 2

No such file or directory

Solution …

Spark SQL

Feb 20, 2019

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Spark SQL Guide

Since a Spark DataFrame is immutable, you cannot update or delete records from a physical table (e.g., a Hive table) directly using Spark DataFrame/SQL API. However …

Spark Issue: TypeError WithReplacement

Dec 17, 2021

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

TypeError: withReplacement (optional), fraction (required) and seed (optional) should be a bool, float and number; however, got [].

Causes

An integer number (e.g., 1) is passed to the fraction parameter …

Process Big Data Using Spark

Jan 05, 2017

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

Please refer to Spark SQL for tips specific to Spark SQL.
It is almost always a good idea to filter out null value in the joinining columns before joining …

Spark Issue: InvalidResourceRequestException

Dec 09, 2021

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=16 …

← Older Newer →

Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: Could Not Execute Broadcast in 300S

Symptoms

Spark Issue: Pure Python Code Errors

Symptom 1

Solution 1

Symptom 2

Solution …

Spark SQL

Spark Issue: TypeError WithReplacement

Symptoms

Causes

Process Big Data Using Spark

General Tips

Spark Issue: InvalidResourceRequestException

Symptoms