Once your agents are in production, you want to make sure that they keep running, and they don't break, and that you don't break them. If you're an engineer, you probably are very familiar with CI and CD. So now we're going to talk about what that looks like for AI agents, how you make sure that your agents, as you're improving them, are not breaking, and they're running reliably after they go live. It's helpful to think about this the same way that we think about CI/CD flows in more traditional development applications. Let's go into that and see how CI/CD works for AI agents. We're going to talk about shipping AI agents at scale next. Even though deployment is important, it's not the main challenge. The real hurdle is keeping these agents running reliably after they go live. So it's helpful to think about this the same way that engineers approach CI/CD flows, for example. So some of the questions that you have going into this is things like, how can I support new models? And this is important because there are so many models that are being launched weekly, and so many new developments on the model strategies, and variations, and fine tunes. So the ability to truly experiment with these different models and try them out is something that can have a huge impact on how you build the systems. But not only that, you also will be thinking about how the systems are evolving over time to make sure that they're getting better, not worse, and to make sure they're still working and don't start just breaking. So once the systems are live, you will constantly have these questions about them. You wonder whether you're using the new or the cheaper and the fastest model. You're wondering if these are working as expected. And when you start to think about these challenges, things become more complex. It's important to approach the production agents with deeper planning and consideration. So if you really think about the process from going to actually figuring out a process that you want to automate, all the way to have these agents monitor and deploy, initially, you want to start with just understanding the problem. And I know this sounds like not a big aha moment, but let me tell you what. Most of the companies out there, they're not failing because of the tech. The tech is there, and the tech is quite impressive. They're failing because they're not really thinking about their use cases. They're not really mapping their use cases. They're not thinking about measuring them. And what is the definition of success for the use cases? And then for mapping the process, you then want to implement a first version. And the idea here is moving extremely fast. You want to get the zero to one win as fast as you can. And then you want to establish a valid set. The idea from the first version to establishing the valid SAT is that you just don't want to be an MVP. You want something that got things moving. Don't worry about perfection. You're going to refine it later. But once you have that first version, you can start establishing what is good and what is bad, what good looks like for that automation. And that will be your validation set. So this will give you a clear baseline to measure whether your updates are improving or degrading in performance. And when I talk about a valid set here, what I'm trying to do is to establish a baseline that you can measure against. You need at some point doing your implementation, have a set of use cases or a set of runs that you have done that establish what good looks like. And the reason why that's important is that as you implement either guardrails or changing the LLMs or training or testing, you need to have something that you can certainly check against. You need to know if you're moving the needle, if you're actually driving improvements or not. And the only way that you can do that is by establishing a baseline. So establishing a valid set early on is important. And then updating that as you go, it's even more important so that you make sure that as you change these things, they not only keep working, but they keep getting better. And after setting up that validation set, then you can start implementing guardrails to ensure that your agents are acting properly when some of these events occur. And that's where a lot of the things that we saw come into play, not only the guardrails, but the testing, the training and everything in between. But the key here is to establish this validation set early and implement the guardrails and triggers after things are working and just maintain ongoing monitoring. This cycle will help you build better versions faster, grounded in measurable progress against your baseline. And when building this validation set, just make sure that you don't just store prompts. You also record the actual metrics like the quality assessment, the runtime, the cost. So collecting and analyzing this data over time can help you make informed decisions about model selection, trade-offs and performance improvements. And that's why it's important for you to think about that. Now, I do want to make sure that we are thinking about these things right. You already noticed that there's two vectors of improvements that we can drive in here. Through the process of building these agents, you do want to make sure that everything is properly versioned and you clearly understand what is changing on every single version. So at CrewAI, we focus heavily on separating the prompts where you have the roles, the goals, the backstories, the descriptions and the expected outputs from the actual agent logic itself. That way, by keeping the prompts separate, allows you to update independently from the code. And since prompts are stored as YAML files, they are very easy to add and maintain, even by non-technical team members. And you can also create repositories for these prompts to be reused across different agents or projects or even use our agent repository to store the agents. We're going to talk about that in a second. Now, for the agent logic, on the other hand, is more custom and written in more traditional code, especially when you add tools and hooks and guardrails, even so much of it can still be reused. So this separation of concerns make possible to reuse well-designed agents, especially those with strong prompts and toolset across many tasks and use cases with minimum modification. Now, let's say that you want to create your crew. We already kind of have seen this a little bit, but you can actually use your CLI to create an entire structure for you. All you got to do is write, crewai create crew and give a name to your crew. And then you can go into that folder and you're going to find this entire folder structure for you. And you will see that your agents and tasks are all mapped out to these YAML files that have the agents' roles, the goals, the backstories and the tasks' descriptions and expected outputs. Let's say that you put a lot of effort into an agent thinking the right tools, the right prompt, and it just works to the point that you can reuse to do many tasks on many different use cases. Well, the good news is that we actually offer you the ability to not only have this agent YAML files that you're going to be using on your project, but also actually using them on your agent repository so that you can reuse them across many use cases. So you can now go into and create these agents in your repository online and reuse them locally. And you can do this for free. All you got to do is go through this and create your agents and it's an entire UI dedicated for that. And once that you want to use it personally on your local computer, you can load from that repository itself directly. So this is very important because it allows you to have these agents and reuse them over time. And the same thing is true for tools. With tools, you can have this one single repository where you throw all of your tools in there, public or private, and you can reuse them across many use cases. If you want to create a tool, all you got to do is, again, use that same CLI and type the command crewai tool create and give your tool a name. From that point on, you're going to have a structure of what a custom tool looks like, and you can modify it to do whatever you want. It can connect to a specific system. It can connect to an external system. It can actually build a video. Anything that you can think of, you can make into this tool. It's all regular Python code. The moment that you run the publish command, now that tool is available, not only for other people on your team to actually do crewai tool install and have access to that tool and reusing that across many different use cases, but even on that no-code UI builder that we saw all the way in the beginning of the course, you also have those tools available to you. So you can use this tool repository to publish and share tools across your organization, giving everyone access to a centralized internal library. And these tools can include custom code, MCP servers, or anything in between. And this approach is exciting because it not only improves observability and metrics at the crew and the use case level, but also creates reusable LEGO pieces that can be applied across many different projects and workflows. Now, I do want to make sure that we're taking one step back, and we're talking about these use cases, and it's important for us to not overlook the planning stage. I briefly touched into that at the beginning of this lesson, but I want to go back and close into this. Many people skip and go straight into writing code, which can be exciting and produce very quick results, but carefully planning leads to so much better outcomes. A common pitfall is failing to define success clearly and not breaking this process into smaller, manageable chunks that can become individual tasks. Some teams also forget to measure or evaluate the results because they never established those metrics in the first place. And as you move towards production, take the time to plan thoughtfully so your system is well-structured, measurable, and built to evolve effectively over time. So the four things you've got to keep in mind is spend time on planning, have a clear definition of success, make sure that you're breaking your process into smaller chunks, and make sure that you have a way to measure and evaluate. We have learned so much in this module, all the way from collaboration to agents, coordination, different patterns, building flows, implementing reasoning agents, training, testing, structured output, and so much more. And there is so much more things that we can build now that we have all these tools in our back pocket. Coming up, we will complete the graded quiz and graded lab for this module, where you are going to create a code review flow using all the skills that you have learned in this module. And it's going to be exciting because it's a very hands-on use case that you as an engineer, I'm sure that can appreciate. And we're almost coming up at the end of this course. So once that you're done, you see that in the next module, things are going to go into the next level because we're going to talk about not only AI agents in the context of how you build them, but on the actual business value. It's a very special module because I'm going to try something different. I'm bringing in a group of companies that are building AI agents in production, real practitioners that have hands-on experience, and they're leading some of their teams into this. And I want you to hear from them as I talk with them, what are the challenges, what are the things that are working for them, and what are the things they're having problems with? And that's going to be very exciting. So I hope to see you in there right after your graded course. Please stick with me. It's going to be very interesting. I'll catch you there in a second.