In this lesson, you will take your airflow pipelines to the next level, using an advanced feature to make them adaptable to data at runtime. With this feature, you can create parallelize task instances for each book description file, making your pipeline easier to troubleshoot and more robust. Let's dive in. When running the fetch data back. You might have noticed that it is not fully atomic. All book description files are processed in the same task in. A for loop. This worked fine when prototyping and developing. But imagine an online bookstore that wants to. Add thousands of book descriptions. Every day, provided in hundreds of. Book description files. If there is just one. Formatting error in one file, the for loop fails. And the whole task sales and. To recover needs to be rerun. Processing all book description files again. Even the ones that did not. Cause any issues. Especially for tasks that. Involve interacting with AI. Models. Like embedding. Tasks or inference tasks. This can get expensive quite fast. there is an airflow there is an airflow feature that can help here with dynamic task mapping. You can create a pipeline that adapts to your data at runtime. Specifically, you can say. That you want to create a variable number. Of copies of a task. Depending on input that is determined at the time. The Dag runs in your case. In production, you never know how many new book. Description files need. To be added every hour. It could be one two, ten, or even zero, and then you don't even need to run the rest of the Dag in that hour. So let's make your fetch data Dag adapt to these. Changes and. Give it the ability to create a variable number. Of copies of the tasks that process the book descriptions. Dynamic task mapping is best learned by experimenting. So let's build a simple task. To play around with before modifying the fetch data. Dag. Using the right file magic command, you can write a new file to your DAGs folder containing a new Dag called Simple Mapping. The first task in the stack. Called Get numbers, uses the random package to return a list of varying lengths. There are four options. For the return value of. This task. It can return an empty list. This mimics having no new book description files. In that case, the downstream tasks. Should be skipped entirely, or it will return a list containing one, two, or three integers. This mimics having one, two. Or three new. Book description files to process. You want to create one. Two, or three parallel copies of all the tasks that process the. Output downstream. This action. Turning a regular. Task into. One for which airflow creates. Parallel copies, is referred to as. Dynamically mapping. A task. You can define a second task. Downstream of the get. Numbers task, called mapped task one. That is mapped over its output. In this case, let's have one argument. That is always the same, called my constant. Arg, and one argument that. Changes in each. Copy called my changing arg. The return value is the sum of the two integer arguments to tell airflow that this is a mapped task. You can use two special methods when calling the task. Data partial. Is the method that contains all the arguments that stay the same for each copy. Of the task. Let's put in ten for my constant arg. The other method. Is called dot expand. It contains. The keyword argument. That is. Changing in between. Tasks. Which needs to be set to a list. In this example. The argument that. Changes. Is my changing arg, and it is set to the. List that is. Returned by the get numbers. Task. This means. That for each element in the list returned by the get numbers. Task, there. Will be one copy of. The task called mapped task one. Each copy processing one of the elements in the. List as. Input to my changing arc. Since the get numbers. Task can return a list with. One. Two. Three, or zero elements. For any. Given run of the stack, there must be one, two, three. Copies of this task, or zero copies, which was false in the task. Getting skipped entirely. Let's run this stack to see it in action. After running the notebook cell to save the file, the Dag appears in the airflow UI. Let's create a bunch of manual runs. Clicking on the square for the mapped. Task one task. In one Dag run. In the grid view, you can see how many dynamically mapped. Task instances were. Created each time in the UI. Depending on the output of the. Upstream task. Remember what determines the number of mapped. Tasks? Is the number of elements in the list. That is passed to. The mapped parameter in the dot. Expand method. In cases where that list was empty, the task was skipped. Entirely, which. Shows as a. Pink square in the grid view. You can access information about individual copies. Of the task by clicking on their start. Date. easily see how dynamic see how dynamic task mapping helps you troubleshoot in the airflow UI. If one dynamically mapped task fails, you can directly examine its logs and rerun it if necessary, without needing to rerun the other. Mapped task instances of that same task. Once this is. Implemented for the fetch data Dag, one mail formatted. Book description. Will only fail the. Mapped task instance that processes. The file containing that. Specific book description, not the whole processing task. Dynamic tasks also help to increase efficiency, especially. When there is a large number of. Task copies. Now that you have a. Grasp of simple. Dynamic. Task. Mapping, let's chain another dynamic task to the Dag in airflow. You often have several steps processing. Data that you want to. Parallelize. In the fetch data that you want one. Parallel task per book description file for the file transformation and description. Embedding. So to dynamically map. Tasks with. The latter processing the output. Of its upstream task. This is possible by passing. In the output of a dynamically mapped. Task. Which is equal to. The list of return values of all its copies. To the dot expand method of the next. Task. You can add another task. Let's call it mapped task two. That takes at least one argument which can be mapped over. Let's call it my cookie number. This time. When calling. The task. You can use the same. Methods. Again that partial. And that expand. If you don't have any arguments. That stay the same. For each. Task. Copy. Like in this case, you can omit that. Partial insight. That expand past the return value of the previous. Map task. To the argument you want to dynamically map over. Now, that will. Always be the. Same number of dynamically. Mapped tasks for both. The mapped. Task one and mapped task two tasks. You can confirm this. In the airflow UI by running the Dag. Several times and. Checking the number of. Task instances across stack runs. Awesome! Now you can then map any airflow. Task. In the fetch data. Dag. The task that determines the number of dynamically mapped. Tasks is list book description files. It returns a list with all the book description files as elements. The tasks that. Will be dynamically mapped. To process each book description. File. In a separate parallel copy, instead of a loop, are transformed book description files. And create vector embeddings. Each of these tasks. Only needs minor. Changes. For the transform. Book description files task the. Input changes from being. The list of book description files. To just being the name of one individual. Book description. File. Given that the input is now just one file, the loop iterating through the. List of files. Can be removed, and you can change the return value from returning a. List of lists to just. Turn the book. Description. List of. All the books contained in. One single file. Finally, what actually. Turns the. Task into a dynamically. Mapped task is to use the dot. Expand method instead of calling the task directly. You can adjust the next task which. Creates the vector. Embeddings in the exact. Same way. Change the input from being. The list of book data to just. Book data removed for loop. Adjust the return. Value. And use that expand in the function call. All other codes in the tasks can stay the same. And the downstream load embeddings to vector DB. Task. Does not need to be modified either. After running the cell to save the changes, you can. Create a manual run of the Dag in the airflow UI and click in the grid view. On the squares of transformer book description files and. Create vector. Embeddings tasks to see. How many. Task instances were created. This number correspond. To the number of book. Description files. Currently available in your file location and include data. to change the number of dynamically mapped. Task instances in this Dag. You can add more book. Description files using the helper cell. Feel free to add as many files, each containing the data, for at least one book as you like. Great! You now have a solid understanding of dynamic task mapping. To parallelize your airflow tasks. Now there's only one step left in getting this pipeline production ready. Preparing for the event of a task failure, you learn how to configure. Tasks. To automatically retry and send you alerts if they fail. In the next. Lesson.

Please sign in to view this content

Learn Code

Next Lesson

Orchestrating Workflows for GenAI Applications

Introduction
Video
・
3 mins

From Notebook To Pipeline
Video
・
9 mins

Your RAG Prototype
Video with Code Example
・
8 mins

Building a Simple Pipeline
Video with Code Example
・
11 mins

Turning your Prototype into a Pipeline
Video with Code Example
・
9 mins

Scheduling and Dag Parameters
Video with Code Example
・
10 mins

Make the Pipeline Adaptable 
Video with Code Example
・
11 mins

Prepare to Fail
Video with Code Example
・
11 mins

GenAI pipelines in Real Life
Video
・
6 mins

Conclusion
Video
・
1 min

Optional: How to Set up a Local Airflow Environment
Video
・
3 mins

Appendix - Resources, Help, and Downloads
Code Example
・
10 mins

Course Feedback

Community