Ben Chuanlong Du's Blog

It is never too late to learn.

Configure Log4J for Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Show Error Messages Only

When you run Spark or PySpark in a Jupyter/Lab notebook, it is recommended that you show ERROR messages only. Otherwise, there might be too much logging information polluting your notebook. You can set the log level of Spark to ERROR using the following line of code.

Process Big Data Using Spark

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

General Tips

  1. Please refer to Spark SQL for tips specific to Spark SQL.

  2. It is almost always a good idea to filter out null value in the joinining columns before joining …

Docker Images for Remote Desktop

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. x11docker runs GUI applications and desktops in docker and podman containers.

  2. NoMachine is recommended for remote Desktop.

  3. If VNC is used for accessing remote desktop environment in a …

Spark Issue: InvalidResourceRequestException

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptoms

Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=16 …