Partition and Bucketing in Spark

Sep 26, 2020

Tips and Traps¶

Bucketed column is only supported in Hive table at this time.
A Hive table can have both partition and bucket columns.

Suppose t1 and t2 are 2 bucketed tables and with the number of buckets b1 and b2 respecitvely. For bucket optimization to kick in when joining them:

 - The 2 tables must be bucketed on the same keys/columns.
 - Must joining on the bucket keys/columns.
 - `b1` is a multiple of `b2` or `b2` is a multiple of `b1`.

Ben Chuanlong Du's Blog

It is never too late to learn.

Partition and Bucketing in Spark

Tips and Traps¶