Ben Chuanlong Du's Blog

It is never too late to learn.

Job Scheduling and Management Using Apache Airflow

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Installation (MySQL)

  1. Install Apache AirFlow.

    wajig install \
        python3-dev python3-pip \
        mysql-server libmysqlclient-dev
    sudo AIRFLOW_GPL_UNIDECODE=yes pip3 install apache-airflow[mysql]
    
  2. Add the following content into your my.cnf …

Spark Issue: Max Number of Executor Failures Reached

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Symptom

21/06/01 15:03:28 INFO ApplicationMaster: Final app status: FAILED, exitCode: 11, (reason: Max number of executor failures (6) reached)

Possible Causes

The option spark.yarn.max.executor …

The Best Way to Find Files and Manipulate Them

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

There are many cool (command-line) tools which can help you quickly find/locate files.

  1. find
  2. locate
  3. osquery
  4. fselect
  5. ripgrep

Those tools can be combined with the pipe operator | to do further filtering or manipulation. However, after trying all tools I have to state that the best way for a Python user is leveraging the pathlib

Hands on Dulwich

Note: dulwich is not feature complete yet and the development of the project is extremely slow. It is suggested that you use other Python packages instead. For more discussions, please refer to Git Implementations and Bindings in Python .

Tips and Traps

  1. The git command (and thus Dulwich) accepts URLs both with and without the trailing .git