Ben Chuanlong Du's Blog

It is never too late to learn.

Git Implementations and Bindings in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are multiple Git implementations/bindings in Python: pygit2, Dulwich and GitPython .

Below is a simple comparison of the 3 packages.

pygit2 dulwich GitPython
Implementation bindings to libgit2 pure Python bindings …

Tips on Python Build Standalone

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The GitHub repository python-portable has some example scripts for bundling standalone Python environments. It also releases standalone Python environemnts regularly.

Tips on Using env_python.tar.gz

This section is specifically on …

Hands on the Deque Collection in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A deque is implemented via the circular queue data structure and it has O(1) time complexity appending from both ends.

  2. Unlike list and tuple collections, a deque CANNOT be sliced!

Spark Issue: Pure Python Code Errors

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

This post collects some typical pure Python errors in PySpark applications.

Symptom 1

object has no attribute

Solution 1

Fix the attribute name.

Symptom 2

No such file or directory

Solution …

Profile Performance of Python Applications

Tips

  1. cProfile (implemented in C) is preferred over profile (implemented in Python).

  2. The profiler modules (cProfile and profile) and tools based on them (e.g., %prun and %%prun for notebook) are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is time

DataFrame Implementations in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

Alternatives to pandas for Small Data

  1. Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow as memory model. It is the best replacement of pandas for small data at this time. Notice that Polars support multithreading and lazy computation but it cannot handle data larger than memory at this time.