Ben Chuanlong Du's Blog

It is never too late to learn.

Configure Log4J for Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Show Error Messages Only

When you run Spark or PySpark in a Jupyter/Lab notebook, it is recommended that you show ERROR messages only. Otherwise, there might be too much logging information polluting your notebook. You can set the log level of Spark to ERROR using the following line of code.

Process Big Data Using Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. Please refer to Spark SQL for tips specific to Spark SQL.

  2. It is almost always a good idea to filter out null value in the joinining columns before joining …

Spark Issue: InvalidResourceRequestException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=16 …

Spark Issue: IllegalArgumentException: System Memory Must Be At Least

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration …