Planning with code execution is the idea that, instead of asking an LLM to output a plan in, say, JSON format to execute one step at a time, why not have the LLM just try to write code and that code can capture multiple steps of the plan, like call this function, then call this function, then call this function, and by executing code generated by the LLM, we can actually carry out fairly complex plans. Let's take a look at when you might want to use this technique. Let's say you want to build a system to answer questions about coffee machine sales based on a spreadsheet with data like this of previous sales. You might have an LLM with a set of tools like these to get column max, to look at a certain column and get the maximum value, so there's a whole answer, what's the most expensive coffee, or get column mean, filter rows, get column min, get column median, sum rows, and so on. So these are examples of a range of tools you might give an LLM to process this spreadsheet or these rows and columns of data in different ways. Now, if a user were to ask which month had the highest sales of hot chocolate, it turns out that you can answer this query using these tools, but it's pretty complicated. You'd have to use filter rows to extract transactions in January for hot chocolate, then do stats on that, and then repeat for February, figure out stats on that, then repeat for March, repeat for April, repeat for May, all the way through December, and then take the max, and so you can actually string it together with a pretty complicated process using these tools, but it's not such a great solution. But worse, whether someone to ask how many unique transactions were there last week, well, these tools are insufficient to get that answer, so you may end up creating a new tool, get unique entries, or you may run into another query, what were the amounts of last five transactions, then you have to create yet another tool to get the data to answer that query. And in practice, I've seen teams, when they run across more and more queries, end up creating more and more and more and more tools to try to give the other enough tools to cover all the range of things someone may ask about a dataset like this. So this approach is brittle, inefficient, and I've seen teams continuously dealing with edge cases and trying to create more tools, but it turns out there is a better way, which is if you were to prompt LLM to say, please write code to solve the user's query and return your answer as Python code, maybe delimited with these beginning and ending execute Python XML tags, then LLM can just write code to load the spreadsheet into a data processing library, here it's using the pandas library, and then here it actually is coming up with a plan. The plan is, after loading the CSV, first it has to ensure the date column is parsed a certain way, then sort by the date, select the last five transactions, show just the price column, and so on. But these are the steps one, two, three, and four, and five, say, of the plan. Because a programming language like Python, and in this example, also with the pandas data processing library imported, because this has many built-in functions, hundreds or even thousands of functions, and moreover, these are functions that the LLM has seen a lot of data on how to call when. By letting your LLM write code, it can choose from these hundreds or thousands of relevant functions that it's already seen a lot of data on when to use, so this lets it string together different choices of functions to call from this very large library in order to come up with a plan for answering a fairly complex query like this. Just one more example. If someone were to ask, how many unique transactions last week? Well, you can come up with a plan to read the CSV file, parse the date column, define the time window, filter rows, drop duplicate rows, and count. The details of this aren't important, but hopefully what you can see is, if you read the comments here, the LLM is roughly coming up with a four-step plan and is expressing each of the steps in code that you can then just execute, and this will get the user their answer. So for applications where the task can plausibly be done by writing code, letting an LLM express its plan in software code that you can just execute for the LLM can be a very powerful way to let it write rich plans. And of course, the caveat that I mentioned in the module on tool use to consider if you need to find a safe execution environment like a sandbox to run the code, that also applies. Although I know that even though it's probably not the best practice, I also know a lot of developers that don't use a sandbox. Lastly, it turns out that planning with code works well. From this diagram adapted from a research paper by Xinyao Wang and others, you can see that for many different models for the tasks that they examined, code as action in which the LLM is invited to write code and take actions through code, that is superior to having it write JSON and then translate JSON into action or text. And you also see a trend that writing code outperforms having the LLM write a plan in JSON, and writing a plan in JSON is also a bit better than writing a plan in just plain text. Now, of course, there are applications where you might want to give your custom tools to an LLM to use, and so writing code isn't for every single application. But when it does apply, it can be a very powerful way for an LLM to express a plan. So that wraps up the section on planning. Today, one of the most powerful uses of Agentic AI that plans is highly agentic software coders. It turns out that if you ask one of the highly agentic software coding assistance tools to write a complex piece of software for you, it may come up with a detailed plan to build this component of software first, then build a second component, build a third, maybe even plan to test out the components as going along. And then it forms a checklist that then goes through to execute one step at a time. And so it actually works really well for building increasingly complex pieces of software. For other applications, I think the use of planning is still growing and developing. One of the disadvantages of planning is that because the developer doesn't tell the system what exactly to do, it's a little bit harder to control it, and you don't really know in advance what will happen at runtime. But giving up some of this control, it does significantly increase the range of things that the model may decide to try out. So this important technology is kind of cutting edge, doesn't feel completely mature outside of maybe agentic coding where it works well, although I'm sure there's still a lot of room to grow. But hopefully you enjoy using it in some of your applications someday. That wraps up planning. There's one last design pattern I hope to share with you in this module, which is how to build multi-agent systems. We have not just one agent, but many of them working in collaboration to complete the task for you. Let's take a look at that in the next video.