References¶
https://github.com/databricks/koalas
https://databricks.com/blog/2020/08/11/interoperability-between-koalas-and-apache-spark.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions
Run Commands on Remote Machines
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
On a Sinsgle Machine
SSH
-
The pipeline command is run locally. If you want the pipeline command to run remotely, place the whole command to be run remotely in double/single …
Hands on the json Module in Python
Tips and Traps¶
It is suggested that you avoid using JSON for serializing and deserializing data. Please refer to Shotcomes of JSON for detailed discussions on this. TOML and YAML are better text-based alternatives to JSON. If serialization and deserialization is done in Python only, pickle
Garbage Collection in Python
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Hands on the Python Module Packaging
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Version Handling¶
Hands on the requests Module in Python
Comments¶
It is suggested that you use the requests module instead of urllib unless you want to have minimal 3rd-party dependencies.
Response.raise_for_status
is a convenient method for raising an exception corresponding to the HTTP status code.