Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Improve the Performance of Spark
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Plan Your Work
- Have a clear idea about what you want to do is very important, especially when you are working on an explorative project. It often saves you time to …
Tips on Python Build Standalone
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The GitHub repository python-portable has some example scripts for bundling standalone Python environments. It also releases standalone Python environemnts regularly.
Tips on Using env_python.tar.gz
This section is specifically on …
Spark Issue: IllegalArgumentException: Wrong FS
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
java.lang.IllegalArgumentException: Wrong FS: hdfs://..., expected: viewfs://...
Possible Causes
The Spark cluster has migrated to Router-based Federation (RBF) namenodes,
and viewfs:// (instead of hdfs://) is required to access HDFS …
Spark Issue: ViewFs: Cannot Initialize: Empty Mount Table in Config
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptoms
java.io.IOException: ViewFs: Cannot initialize: Empty Mount table in config for viewfs://cluster-name-ns02/
Possible Causes
As the error message says,
viewfs://cluster-name-ns02 is not configured.
-
It is possible that …
Spark Issue: ArrowTypeError: Expect a Type but Got a Different Type
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Symptom
pyarrow.lib.ArrowTypeError: Expected a string or bytes dtype, got int64
Possible Causes
A pandas_udf tag specifies a return type of String
but the corresponding pandas udf returns a different …