built in partnership with CircleCI. Software testing helps you identify bugs and security vulnerabilities in your applications, and automated testing frees up your time and energy further, so they can focus on the creative parts of designing and building your application. In this course, you learn modern software engineering practices focused on testing for the practical development and deployment of LLM-based applications. Two kinds of LLM evaluations that you implement in this course are rule-based evaluations and model-graded evaluations. Rule-based evals use string or pattern matching, for example, regular expression matching, and are fast and cost-effective to run. I use these whenever I want to evaluate outputs that have a clear right answer, such as sentiment classification and if, say, you have ground-truth labels. Rule-based evals are quick and cheap to run, so you can run these tests every time you commit a code change to get fast feedback on the health of your application. Model graded evaluations are relevant for applications where there are many possible good or bad outputs. For example, if you ask an LLM to write text content for you, there can be more than one high-quality response. Here, you might prompt an evaluation LLM to have it assess the quality of the output of your application LLM. In other words, use an LLM to evaluate the output of another LLM. Model-graded evals take more time and cost more, but they allow you to assess more complex outputs. I'm delighted to introduce our instructor for this course, Rob Zuber, Chief Technology Officer for CircleCI. Rob has spent decades leading engineering teams and also helping customers scale up their software delivery practice by making processes repeatable, scalable, and reliable. He'll show you how to do this for your applications as well with an emphasis on testing. Thanks, Andrew. In your software development process, you and your teammates may commit code updates or bug fixes multiple times per day. In this course, you'll learn to set triggers that automatically run your evaluations whenever you or your teammates commit code changes to the repository. Your team may also release updated versions of the app on a broader cadence, perhaps once every two weeks. Before deploying to users, you can also automate more holistic, comprehensive, pre-release evaluations. For per-commit evals, you can include rules-based evaluations because they're fast and cheap to run. And for those pre-release evals, you can include rules-based evaluations because they're fast and cheap to run. And for those pre-release evals, it may be very helpful to use model-graded evals to do more thorough testing before deployment. By the end of the course, you will combine per-commit and pre-release evals into an automated testing suite. And for this course, you'll design tests to detect hallucinations in LLM responses. Many people have worked to make this course possible. I'd like to thank on the CircleCI side, Michael Webster, Jacob Schmidt, and Emma Webb. From DeepLearning.ai, Eddie Hsu and Eshmal Gargari have also contributed to this course. The first lesson will be a quick overview of continuous integration terms and technologies that we'll use as the foundation for building our automated LLM testing pipeline. When you finish this course, that will be a real testament to your dedication to building good applications. Or if you're not sure how much you'll use these ideas, you can still test the waters. So let's go on to the next video to get started.

Automated Testing for LLMOps

Introduction
Video
・
3 mins

Introduction to Continuous Integration (CI)
Video
・
4 mins

Overview of Automated Evals
Video with Code Example
・
23 mins

Automating Model-Graded Evals
Video with Code Example
・
7 mins

Comprehensive Testing Framework
Video with Code Example
・
12 mins

Conclusion
Video
・
1 min

Quiz

Graded・Quiz

・

10 mins

Optional: Exploring the CircleCI config file
Code Example
・
10 mins