A Comprehensive List of Common Issues in Spark Applications
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
List of Common Issues
Please refer to http://www.legendu.net/misc/tag/spark-issue.html for a comprehensive list of Spark Issues and (possible) causes and solutions.
Debugging Tips
Spark/Hadoop …
Rust and Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The simplest and best way is to leverage pandas_udf
in PySpark.
In the pandas UDF,
you can call subprocess.run
to run any shell command
and capture its output.
from pathlib …
Yarn for Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
List all Spark applications.
yarn application --list
-
Show status of a Spark application.
yarn application -status application_1459542433815_0002
-
view logs of a Spark application.
yarn logs -applicationId application_1459542433815_0002
-
kill a Spark application …
Spark Issue Libc Not Found
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptom
/lib64/libc.so.6: version `GLIBC_2.18' not found (required by ...)
Cause
The required version of GLIBC by the binary executor is not found on Spark nodes.
Solution
Recompile your …
Data Quality
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Upper and lower bounds tests and Inter Quartile Range Checks(IQR) and standard deviations
-
Aggregate level checks (after manipulating data, there should still be the ability to explain how the data …