AI is the new electricity and will transform and improve nearly all areas of human lives.

Quick Guide & Tips

💻 Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: Click on "File"

2: Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.

🔄 Reset User Workspace

If you need to reset your workspace to its original state, follow these quick steps:

1: Access the Menu: Look for the three-dot menu (⋮) in the top-right corner of the notebook toolbar.

2: Restore Original Version: Click on "Restore Original Version" from the dropdown menu.

For more detailed instructions, please visit our Reset Workspace Guide.

💻 Downloading Notebooks

In each notebook on the top menu:

1: Click on "File"

2: Then, click on "Download as"

3: Then, click on "Notebook (.ipynb)"

💻 Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.

📗 See Your Progress

Once you enroll in this course—or any other short course on the DeepLearning.AI platform—and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).

📱 Features to Use

🎞 Adjust Video Speed: Click on the gear icon (⚙) on the video and then from the Speed option, choose your desired video speed.

🗣 Captions (English and Spanish): Click on the gear icon (⚙) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

🔅 Video Quality: If you do not have access to high-speed internet, click on the gear icon (⚙) on the video and then from Quality, choose the quality that works the best for your Internet speed.

🖥 Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

√ Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.

🧑 Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

🧑 Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

📅 Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

☕ Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

💬 Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

✍ Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.

📚 Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. 👇

👉👉 🔗 DeepLearning.AI – All Short Courses [+]

🙂 Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community 👉👉 🔗 DeepLearning.AI Forum

Sign in

Or, sign in with your email

Email

Password

Forgot password?

Don't have an account? Create account

By signing up, you agree to our Terms Of Use and Privacy Policy

Create Your Account

Or, sign up with your email

Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Plan

Planning for more users?

What best describes you?

This helps us tune the catalog to suit you best.

Software Engineer

Data Scientist

Machine Learning Engineer

Data Analyst

Product Manager

Entrepreneur

Business / Consulting

Research / Academic

Student

Other

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Join Team Success

You have successfully joined undefined

You now have access to all Pro features. Click below to start learning!

Session Expired

Session expired — please return to Cornerstone to restart the session and complete the course.

/

Voice for AI Agents and Applications

All Courses

/

Voice for AI Agents and Applications

All Courses

Voice for AI Agents and Applications

Voice for AI Agents and Applications

Course Syllabus

Elevate Your Career with Full Learning Experience

Unlock Plus AI learning and gain exclusive insights from industry leaders

Access exclusive features like graded notebooks and quizzes

Earn unlimited certificates to enhance your resume

Starting at $1 USD/mo after a free trial – cancel anytime

Voice systems regress silently, and you'll only find out when users complain. In this lesson, you will set up evaluation-driven development using Vocal Bridge's built-in multimodal evaluator. It scores your calls, suggests prompt improvements, and helps you catch issues before they reach production. Let's have some fun. Lessons two, three and four were about building. Lesson five is about not shipping by vibes. An Eval is a repeatable test that scores your AI system against the outcomes you actually care about. Think of it like a unit test, but for a non-deterministic system that you cannot diff check. Every shipping AI system needs them. And here's why. A prompt edit ships clean, but quietly breaks with three edge cases. A model swap changes behavior in ways no integration test catches. Without ground truth, you find regressions from your customers, which is the worst possible place to find them. And without scoring, you can't tell whether a change you made actually moved things forward or backward. Eval-Driven Development which is building and iterating on a system using its eval scores as the primary signal. That is the discipline that turns a demo into a product. Text evals just compare strings. Voice doesn't give you that luxury. Voice actually involves audio, timing, tone, multiple turns, an agent that can interrupt or be interrupted, and none of that show up in a text diff or even in a transcript. Six things make voice evals harder than text evals. One, The signal is multimodal. A voice turn carries text, audio, pacing, prosody. Score one dimension and you miss the others. Number two, regressions are silent. A new TTS voice or a text-to-speech voice can sound robotic on specific phrasing and the transcript looks identical. Number three, turn-taking matters. Did the agent wait for the user to finish? Did it interrupt at the right moment? Text alone can't tell you that. Number four, calls are hard to replay. You can't rerun a live call deterministically. So recordings plus structured logs become your only ground truth. Number five, tool calls happen inline. Did the agent invoke the right function at the right time with the right arguments mid-conversation. And six. Latency itself is part of user experience. A correct answer four seconds late is just the wrong answer here. If you only score the text, you will miss most of what actually broke. Okay, so how do we actually score it? There are two layers. First layer is this Hard Numerical Metrics. Things that are actually measurable. WER or word error rate for transcription accuracy, MOS or mean opinion score for synthesis quality, TTFB or time to first byte for first token or first audio latency. Turn duration end to end. Tool-call accuracy, whether the agent called the right function with the right arguments, and whether it triggered a tool call at all. Completion rate. These are all things that you can compute deterministically. You can create a baseline with these numbers. You can fail builds on regressions. Now the second layer, using a Multimodal LLM as a judge. For the qualitative things the metrics can't see. Did the agent's tone match the situation, was the turn-taking natural, with no awkward pauses? Did the call actually achieve the objective? Did it gracefully handle the user going off-script? How well did it recover from interruptions or errors? And really importantly, concrete prompt edits that judge suggests for next time, so that you can improve the agent and its performance. Numerical metrics catch what's objective, and the judge catches the qualitative drift. That's otherwise invisible. Together, and especially at scale, that's what makes voice evals tractable. So the notebook will have you run this on real recorded calls with one single CLI command, vb eval. All you have to do is pass it the session ID for the call. and the objective. It bundles everything, the audio recording of the conversation, the transcript, agent configuration, the tool call log into a payload for the multimodal judge, which will then return a structured report you can wire into your CI/CD loop. Sample report looks something like this. You'll have the Objective at the top, confirm appointment for tomorrow at 2pm. Let's say something like Outcome Confirmed. It'll give you an overall score out of 10, and then the Breakdown. Such as the conversation quality, what the turn taking was like, whether it was natural or awkward, what was the satisfaction of the caller, you can describe and define all of this criteria in the scenario. And the most important part, which will actually help you close the loop is the suggestion for editing the configuration for the agent. It could be something like soften the opening greeting because the current phrasing reads as scripted comes off as unnatural. Maybe you can try, Hi! Just checking in on your 2pm tomorrow... So that is a complete loop. Let's open the notebook. All right, time for lesson five. Voice AI Evals. So you've built three working voice agents. Now we install the discipline that separates a demo from a production system. It's called Eval-Driven Development. Vocal Bridge ships a built-in multimodal evaluator. We give it a recorded session and an objective. It sends the audio, the agent's full configuration, the transcript, the tool call log, all of it to a multimodal LLM, which then scores the call and proposes concrete prompt edits. For this notebook, you need at least one completed call with a recording on it. So the easiest source would be the call you placed in lesson four. We will pull its session ID off the list in a second. First we make sure that the setup goes through. Pretty standard stuff. Right, so just two helpers this lesson. The vb wrapper you already know from earlier, and eval_session. This is the new one. That's a one-line wrapper around vb eval. So, our CLI or the Vocal Bridge CLI can be accessed through the vb alias and eval is the command. So vb eval is wrapped inside this helper, wrapped by this helper eval_session. That's literally the entire eval API surface. Okay, so now we pull the most recent completed calls, so we can pick one. If you ran the lesson four call, its session ID will be at the top of the list. We will copy the session ID and use it in the next cell. So we copied the session ID in the previous step. Now, you need to give it an objective. That's what the agent was supposed to accomplish on that call. And the objective is what makes the eval meaningful. Without an objective, the LLM is just grading vibes, really. With one it can actually tell you whether the call succeeded. Under the hood, this is one CLI command. vb eval session ID and then the objective. Again, we just pass the session ID and the objective. The objective here, if you remember from lesson four, was the caller would place an outbound demo call and have a brief on-topic conversation with the callee. Takes a few seconds because it's a multimodal LLM doing the evaluation. The CLI sends the audio, the agent config, the transcript, the tool log, all of it, all the context to the multimodal judge model. So we are using LLM as the judge approach here. Eval reports come back as a structured object. As you can see, here we have the session_id, we have the objective, which we provided the eval_session helper, and then a score. So, the result object has a few different properties, the score, which is 10. Great to hear that. This is a score out of 10, so the best you can do is 10. And then the verdict, whether the evaluation passed the agent and whether it met the objective. So it's a pass by binary decision evaluation. And then the summary, the multimodal LLM, the LLM as a judge approach will give you a summary, a qualitative summary of how the agent performed. It'll also give you what worked and what didn't. So in this case, everything worked, which is great to hear. It was a simple prompt after all. And then finally, this is the most useful one, which is suggested prompt improvements. This is telling you, the developer, what improvements you can make to the prompt and configuration of the agent to meet the objective configuration of the agent to meet the objective and get closer to a perfect score. In this case, it is already perfect, so you can see it says the prompt is highly effective for this scenario, no changes are suggested. Now the eval reports, as you saw in the previous step, come back as a structured object. The two things you will actually act on are the score, which is a 0 to 10 quality rating, and the suggestions for improving the prompt. If the call missed the objective, the model tells you exactly what went wrong. That's your prompt diff, right there. All right, so again, this cell just shows you how you can parse that report and display it in a more readable format. Now you can take it one step further. You can also build a completely comprehensive eval suite. I'm not going to run the cell. I will let you explore this on your own. But this is an example of the kind of eval suite you can build where you can evaluate the agent against a set of scenarios. So the same agent getting evaluated against different scenarios. So the first one is, you know, confirming appointment for tomorrow at 2:00 p.m. The second one is the same objective, but it's a different scenario. So you can configure those situations and really evaluate your agent comprehensively at scale before going to production. And here's the thing. The discipline is the loop, not any single number. Build, call, eval, look at the suggestions, patch the prompt, call again, eval again. A regression isn't actually fixed until a fresh call against the same objective scores higher. So the eval suite is what will keep you honest about it. And typically, this is what a patch loop would look like. Again, I'm not going to run it for you, but feel free to explore it on your own and try different versions of objectives, scenario, and the prompt and see how having an eval suite improves the performance of your agent. And that's the course. You've built across all three Vocal Bridge services: voice in your app, voice for your existing agent, voice as a tool and installed the eval driven dev loop on top. From here, the natural next step is production hardening. Auth, monitoring, latency tuning, all the stuff that turns a working demo into a deployed product. And that is exactly the topic of my conversation with Scott Johnston, former CEO of Docker and current board member of Vocal Bridge. I'll see you there.

deco top

deco bottom

Voice for AI Agents and Applications

Sign in to continue learning

Voice for AI Agents and Applications

Beginner

1h26m

Topics

Agents

GenAI Applications

LLMOps

Collaborator

Voice for AI Agents and Applications

Introduction
Video
・
4m

Overview of Voice UI
Video
・
9m

Voice in Your App
Video with Code Example
・
10m

Voice for Your Agent
Video with Code Example
・
12m

Voice as a Tool
Video with Code Example
・
9m

Voice AI Evals
Video with Code Example
・
10m

Voice Agents in Production
Video
・
8m

Conclusion
Video
・
1m

Glossary
Reading
・
10m

(Optional) Claim Vocal Bridge Credits
Code Example
・
1m

Graded・Quiz

Course Details