Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Please refer to Spark Issue: _Pickle.Picklingerror: Args[0] from Newobj Args Has the Wrong Class for a similar serialization issue in PySpark.
Error Message
org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: ...
Possible Causes
Some object sent to works from the driver is not serializable.
Solutions
-
Don't send the non-serializable object to workers.
-
Use a serializable version if you do want to send the object to workders.
References
https://github.com/databricks/spark-knowledgebase/blob/master/troubleshooting/javaionotserializableexception.md