Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
String Functions in Spark
Tips and Traps¶
You can use the
splitfunction to split a delimited string into an array. It is suggested that removing trailing separators before you apply thesplitfunction. Please refer to the split section before for more detailed discussions.Some string functions (e.g.,
right, etc.) are available in the Spark SQL APIs but not available as Spark DataFrame APIs.
Serialization and Caching in Python
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
functools.lru_cache
https://docs.python.org/3/library/functools.html#functools.lru_cache
cachetools
https://cachetools.readthedocs.io/en/latest/ https://github.com/tkem/cachetools
diskcache sounds like a good options!!!
DiskCache …
Spark Issue: _Pickle.Picklingerror: Args[0] from __Newobj__ Args Has the Wrong Class
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Please refer to Spark Issue: Task Not Serializable for a similar serialization issue in Spark/Scala.
Symptom
Cause
For example, if you have the following import
from nltk.corpus import stopwords …Tips on Vaex
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
General Tips for Docker
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Configure automated builds on Docker Hub
Configure automated builds with Bitbucket
Links
https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
https://coderwall.com/p/4g8znw/things-i-learned-while-writing-a-dockerfile
http://stackoverflow.com/questions/25311613 …