Ben Chuanlong Du's Blog

It is never too late to learn.

Spark Issue: SIGBUS

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

CalledProcessError: Command './pine' died with .

Possible Causes

SIGBUS (bus error) is a signal that happens when you try to access memory that has not been physically mapped . There are several …

Ubuntu Crashes When Opening Settings

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

I encountered an issue with Ubuntu 20.04 on my machine. Ubuntu crashes when I try to open settings or video players. After searching online, it seems that the problem happens …

Spark Issue: Could Not Execute Broadcast in 300S

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Caused by: org.apache.spark.SparkException: Could not execute broadcast in 600 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting …

How Much to Push for Functional Programming and Immutability

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Most new programming languages (such as Rust, Go, Kotlin, Scala, etc.) support functional programming style and have clear distinguishment on mutable vs immutable variables. So, is functional programming superior to imperative …

Partition and Bucketing in Spark

Tips and Traps

  1. Bucketed column is only supported in Hive table at this time.

  2. A Hive table can have both partition and bucket columns.

  3. Suppose t1 and t2 are 2 bucketed tables and with the number of buckets b1 and b2 respecitvely. For bucket optimization to kick in when joining them:

     - The 2 tables must be bucketed on the same keys/columns.
     - Must joining on the bucket keys/columns.
     - `b1` is a multiple of `b2` or `b2` is a multiple of `b1`.