Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
Caused by: org.apache.spark.SparkException: Could not execute broadcast in 600 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1
Possible Causes
-
Broadcast was too large or the network was slow which caused the boradcast to timeout. Notice that is usually a superficial cause instead of the deep root cause.
-
DataFrame caching failed due to failure of Spark nodes. If so, you will see other error message such as block rdd_123_456 could not be removed as it was not found on disk or in memory .
Possible Solutions
-
Increase the timeout for broadcast (e.g.,
--conf spark.sql.broadcastTimeout=600s
) or totally disable broadcast join (e.g.,--conf spark.sql.autoBroadcastJoinThreshold=-1
) . Since broadcast timeout is usually a superficial cause, this is unlikely to fix the issue in the Spark application. However, it helps to eliminate one possible cause at least. -
Refer to block rdd_123_456 could not be removed as it was not found on disk or in memory for possible solutions.