Learn how to add observability to your agent to gain insights into its steps and know how to debug it.
We'd like to know you better so we can create more relevant courses. What do you do for work?
Instructors: John Gilhuly, Aman Khan
Learn how to add observability to your agent to gain insights into its steps and know how to debug it.
Learn how to set up evaluations for the agent components by preparing testing examples, choosing the appropriate evaluator (code-based or LLM-as-a-Judge), and identifying the right metrics.
Learn how to structure your evaluation into experiments to iterate on and improve the output quality and the path taken by your agent.
Learn how to systematically assess and improve your AI agentâs performance in Evaluating AI Agents, a short course built in partnership with Arize AI and taught by John Gilhuly, Head of Developer Relations, and Aman Khan, Director of Product.
When youâre building an AI Agent, an important part of the development process is evaluations or evals. Whether youâre building a shopping assistant, coding agent, or research assistant, having a structured evaluation process helps you refine its performance systematicallyârather than relying on trial and error.
With a systematic approach, you structure your evaluations to assess the performance of each component of the agent, as well as its end-to-end performance. For each component, you select the appropriate evaluators, testing examples, and metrics. This process helps you identify any areas of improvement so you can iterate on your agent during development and in production.
In this course, youâll build an AI agent, add observability to visualize and debug its steps, and evaluate its performance component-wise.
In detail, youâll:
By the end of this course, youâll know how to trace AI agents, systematically evaluate them, and improve their performance.
Anyone who has basic Python knowledge and wants to learn to evaluate, troubleshoot, and improve AI agents effectivelyâboth during development and in production. Familiarity with prompting an LLM model would be helpful but not required.
Introduction
Evaluation in the time of LLMs
Decomposing agents
Lab 1: Building your agent
Tracing agents
Lab 2: Tracing your agent
Adding router and skill evaluations
Lab 3: Adding router and skill evaluations
Adding trajectory evaluations
Lab 4: Adding trajectory evaluations
Adding structure to your evaluations
Lab 5: Adding structure to your evaluations
Improving your LLM-as-a-judge
Monitoring agents
Conclusion
Appendix - Resources, Tips and Help
Course access is free for a limited time during the DeepLearning.AI learning platform beta!
Keep learning with updates on curated AI news, courses, and events, as well as Andrewâs thoughts from DeepLearning.AI!