During the previous parts in this series, I introduced Apache Airflow in general, demonstrated my docker dev stack, and built out a simple linear DAG definition. I want to wrap up the series by showing a few other common DAG patterns I regularly use.
This will take a few minutes to get everything initialized, but once its up you will see something like this:
DAG Patterns
I use 3 main DAG patterns. Simple, shown in Part 3, Linear, and Gather. Of course, once you master these patterns, you an combine them to make much more complex pipelines.
Simple DAG Pattern
What I call a simple pattern (and I have no idea if any of these patterns have official names) is a chain of tasks where each task depends upon the previous task. In this case make_icecream_sundae_tasks depends upon choose_toppings_task which depends upon choose_icecream_flavor_task, and finally choose_icecream_flavor_task depends upon choose_cone_task.
If you read the Part 3 in the series, you will have seen this dependency list written out as:
Now, let's say that I have 2 lists of tasks: taskA_1, taskA_2, taskA_2 and taskB_1, taskB_2, taskB_3, where taskB_1 depends upon taskA_1, taskB_2 depends on taskA_2, and taskB_3 depends on taskA_3.
In keeping with my ice cream theme I've created a list of ice cream tasks. This example is a bit silly, because we could just as well go with the first example and simply throw more workers or cpus at it and get the same result, but for the interest of instruction just go with it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You will see that for this example instead of directly declaring my operator instance I instead wrapped it in a function that returns an instance of an operator. This is a much more flexible way of declaring your operators and I recommend using it. You can even pass the DAG in as a parameter of your function, allowing you to reuse operators across DAGs if that is what you are going for.
This last pattern is similar to the previous, but instead of having completely separate task paths, instead you have a single task that depends upon a grouping of other tasks. The beauty of using Airflow to do this is that assuming you have enough computational power, choose_cone_task_1, choose_flavor_task_1 and choose_toppings_task_1 will all run in parallel, at the same time. NEAT!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
That's it. I hope you found this series helpful, and demonstrated how you can use different DAG patterns to piece as building blocks to more complex functionality.
For more Airflow fun checkout this great curated list of Airflow related blog posts and resources! Awesome Apache Airflow.
Get the first 3 chapters of my book, Bioinformatics Solutions on AWS, as well as weekly updates on the world of Bioinformatics and Cloud Computing, completely free, by filling out the form next to this text.
Bioinformatics Solutions on AWS
If you'd like to learn more about AWS and how it relates to the future of Bioinformatics, sign up here.
We won't send spam. Unsubscribe at any time.
Join Our Free Trial
Get started today before this once in a lifetime opportunity expires.