Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Show Error Messages Only¶
When you run Spark or PySpark in a Jupyter/Lab notebook, it is recommended that you show ERROR messages only. Otherwise, there might be too much logging information polluting your notebook. You can set the log level of Spark to ERROR using the following line of code.
Process Big Data Using Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
General Tips
-
Please refer to Spark SQL for tips specific to Spark SQL.
-
It is almost always a good idea to filter out null value in the joinining columns before joining …
Spark Issue: InvalidResourceRequestException
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=16 …
Spark Configuration
A Comprehensive List of Common Issues in Spark Applications
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
List of Common Issues
Please refer to http://www.legendu.net/misc/tag/spark-issue.html for a comprehensive list of Spark Issues and (possible) causes and solutions.
Debugging Tips
Spark/Hadoop …
Spark Issue: IllegalArgumentException: System Memory Must Be At Least
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptom
Exception in thread "main" java.lang.IllegalArgumentException: System memory 466092032 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration …