Ben Chuanlong Du's Blog

It is never too late to learn.

Handling Complicated Data Types in Python and PySpark

Tips and Traps

  1. An element in a pandas DataFrame can be any (complicated) type in Python. To save a padnas DataFrame with arbitrary (complicated) types as it is, you have to use the pickle module . The method pandas.DataFrame.to_pickle (which is simply a wrapper over pickle.dump) serialize the DataFrame to a pickle file while the method pandas.read_pickle

Use Streamlit to Build a Web App Quickly Using Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

https://docs.streamlit.io/

https://github.com/streamlit/streamlit

https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace

Docker APIs

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Python

docker-py

docker-py is a Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run containers, manage containers, manage …

Work With Multiple Spark Installations

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

spark-submit and spark-shell

Overwrite the PATH environment variable before invoking spark-submit and/or spark-shell often resolves the issue.

Spark in Jupyter/Lab Notebooks

Remove or reset the environment variable HADOOP_CONF_DIR resolves …