Handling Complicated Data Types in Python and PySpark

May 07, 2020

Tips and Traps¶

An element in a pandas DataFrame can be any (complicated) type in Python. To save a padnas DataFrame with arbitrary (complicated) types as it is, you have to use the pickle module . The method pandas.DataFrame.to_pickle (which is simply a wrapper over pickle.dump) serialize the DataFrame to a pickle file while the method pandas.read_pickle

CI/CD Tools and Frameworks

Apr 07, 2021

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tools

GitHub Actions

Travis CI

Jenkins

FrameWorks

buildbot

act

Run your GitHub Actions locally.

References

MLOps: Continuous delivery and automation pipelines in machine learning

Use Streamlit to Build a Web App Quickly Using Python

Apr 07, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://docs.streamlit.io/

https://github.com/streamlit/streamlit

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace

Docker APIs

Nov 07, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Python

docker-py

docker-py is a Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run containers, manage containers, manage …

CI/CD for Machine Learning

Jun 05, 2020

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

GitHub Actions

Travis CI

Jenkins

act

Run your GitHub Actions locally.

References

MLOps: Continuous delivery and automation pipelines in machine learning

Work With Multiple Spark Installations

Mar 30, 2021

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

spark-submit and spark-shell

Overwrite the PATH environment variable before invoking spark-submit and/or spark-shell often resolves the issue.

Spark in Jupyter/Lab Notebooks

Remove or reset the environment variable HADOOP_CONF_DIR resolves …

← Older Newer →

Ben Chuanlong Du's Blog

It is never too late to learn.

Handling Complicated Data Types in Python and PySpark

Tips and Traps¶

CI/CD Tools and Frameworks

Tools

GitHub Actions

Travis CI

Jenkins

FrameWorks

buildbot

act

References

Use Streamlit to Build a Web App Quickly Using Python

Docker APIs

Python

docker-py

CI/CD for Machine Learning

GitHub Actions

Travis CI

Jenkins

act

References

Work With Multiple Spark Installations

spark-submit and spark-shell

Spark in Jupyter/Lab Notebooks