A Comprehensive List of Common Issues in Spark Applications
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
List of Common Issues
Please refer to http://www.legendu.net/misc/tag/spark-issue.html for a comprehensive list of Spark Issues and (possible) causes and solutions.
Debugging Tips
Spark/Hadoop …
Rust and Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The simplest and best way is to leverage pandas_udf in PySpark.
In the pandas UDF,
you can call subprocess.run to run any shell command
and capture its output.
from pathlib …Yarn for Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
List all Spark applications.
yarn application --list -
Show status of a Spark application.
yarn application -status application_1459542433815_0002 -
view logs of a Spark application.
yarn logs -applicationId application_1459542433815_0002 -
kill a Spark application …
Spark Issue Libc Not Found
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptom
/lib64/libc.so.6: version `GLIBC_2.18' not found (required by ...)
Cause
The required version of GLIBC by the binary executor is not found on Spark nodes.
Solution
Recompile your …
Data Quality
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
-
Upper and lower bounds tests and Inter Quartile Range Checks(IQR) and standard deviations
-
Aggregate level checks (after manipulating data, there should still be the ability to explain how the data …