Ben Chuanlong Du's Blog

It is never too late to learn.

Workflow Managing Tools

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Star History of Workflow Management Tools

  1. Apache Airflow is the recommended tool for managing workflows current! A big advantage of Airflow over other workflow managing tools (e.g., UC4) is that the workflow is expressed in (simple and concise) Python code. It is easy to version control and review changes in source code while it is extremly hard to do so for graphically expressed workflows, especially when the workflow grows large.

prefect

apache/airflow

peace

peace is a framework to build empathetic and forgiving software automation.

Luigi

Kubeflow

MLFlow

Argo

mara/data-integration

azkaban/azkaban

StackStorm/st2

rundeck/rundeck

crontab

schedule

An in-process scheduler for periodic jobs that uses the builder pattern for configuration. Schedule lets you run Python functions (or any other callable) periodically at pre-determined intervals using a simple, human-friendly syntax.

Which One to Use

  • Apache Airflow if you want the most full-featured, mature tool and you can dedicate time to learning how it works, setting it up, and maintaining it.
  • Luigi if you need something with an easier learning curve than Airflow. It has fewer features, but it’s easier to get off the ground.
  • Argo if you're already deeply invested in the Kubernetes ecosystem and want to manage all of your tasks as pods, defining them in YAML instead of Python.
  • KubeFlow if you want to use Kubernetes but still define your tasks with Python instead of YAML.
  • MLFlow if you care more about tracking experiments or tracking and deploying models using MLFlow's predefined patterns than about finding a tool that can adapt to your existing custom workflows.

Command-line Tools

If you prefer a simple command-line tool to schedule tasks, below are some possible solutions.

  1. at
  2. watch
  3. crontab
  4. schedule
  5. inotify (monitoring file system changes and trigger events)
  6. parallel

References

Airflow vs. Luigi vs. Argo vs. MLFlow vs. KubeFlow

Comments