Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
- Apache Airflow is the recommended tool for managing workflows current! A big advantage of Airflow over other workflow managing tools (e.g., UC4) is that the workflow is expressed in (simple and concise) Python code. It is easy to version control and review changes in source code while it is extremly hard to do so for graphically expressed workflows, especially when the workflow grows large.
prefect
apache/airflow
peace
peace is a framework to build empathetic and forgiving software automation.
Luigi
Kubeflow
MLFlow
Argo
mara/data-integration
azkaban/azkaban
StackStorm/st2
rundeck/rundeck
crontab
schedule
An in-process scheduler for periodic jobs that uses the builder pattern for configuration. Schedule lets you run Python functions (or any other callable) periodically at pre-determined intervals using a simple, human-friendly syntax.
Which One to Use
- Apache Airflow if you want the most full-featured, mature tool and you can dedicate time to learning how it works, setting it up, and maintaining it.
- Luigi if you need something with an easier learning curve than Airflow. It has fewer features, but it’s easier to get off the ground.
- Argo if you're already deeply invested in the Kubernetes ecosystem and want to manage all of your tasks as pods, defining them in YAML instead of Python.
- KubeFlow if you want to use Kubernetes but still define your tasks with Python instead of YAML.
- MLFlow if you care more about tracking experiments or tracking and deploying models using MLFlow's predefined patterns than about finding a tool that can adapt to your existing custom workflows.
Command-line Tools
If you prefer a simple command-line tool to schedule tasks, below are some possible solutions.