Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
Tips and Traps¶
- Pathspec is preferred over zgitignore as the latter is not actively maintained.
zgitignore¶
zgitignore
checks if a file is ignored by a .zgitignore
Tips on Jinja
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
SQL Translation Tools
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The ORM library SQLAlchemy
can be leverage
SQL Translation Tools
Koalas is pandas API on PySpark
References¶
https://github.com/databricks/koalas
https://databricks.com/blog/2020/08/11/interoperability-between-koalas-and-apache-spark.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#module-pyspark.sql.functions
Run Commands on Remote Machines
Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
On a Sinsgle Machine
SSH
-
The pipeline command is run locally. If you want the pipeline command to run remotely, place the whole command to be run remotely in double/single …
Hands on the json Module in Python
Tips and Traps¶
It is suggested that you avoid using JSON for serializing and deserializing data. Please refer to Shotcomes of JSON for detailed discussions on this. TOML and YAML are better text-based alternatives to JSON. If serialization and deserialization is done in Python only, pickle