AI is the new electricity and will transform and improve nearly all areas of human lives.

Quick Guide & Tips

💻 Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: Click on "File"

2: Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.

🔄 Reset User Workspace

If you need to reset your workspace to its original state, follow these quick steps:

1: Access the Menu: Look for the three-dot menu (⋮) in the top-right corner of the notebook toolbar.

2: Restore Original Version: Click on "Restore Original Version" from the dropdown menu.

For more detailed instructions, please visit our Reset Workspace Guide.

💻 Downloading Notebooks

In each notebook on the top menu:

1: Click on "File"

2: Then, click on "Download as"

3: Then, click on "Notebook (.ipynb)"

💻 Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.

📗 See Your Progress

Once you enroll in this course—or any other short course on the DeepLearning.AI platform—and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).

📱 Features to Use

🎞 Adjust Video Speed: Click on the gear icon (⚙) on the video and then from the Speed option, choose your desired video speed.

🗣 Captions (English and Spanish): Click on the gear icon (⚙) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

🔅 Video Quality: If you do not have access to high-speed internet, click on the gear icon (⚙) on the video and then from Quality, choose the quality that works the best for your Internet speed.

🖥 Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

√ Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.

🧑 Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

🧑 Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

📅 Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

☕ Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

💬 Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

✍ Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.

📚 Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. 👇

👉👉 🔗 DeepLearning.AI – All Short Courses [+]

🙂 Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community 👉👉 🔗 DeepLearning.AI Forum

Sign in

Or, sign in with your email

Email

Password

Forgot password?

Don't have an account? Create account

By signing up, you agree to our Terms Of Use and Privacy Policy

Create Your Account

Or, sign up with your email

Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Plan

Planning for more users?

What best describes you?

This helps us tune the catalog to suit you best.

Software Engineer

Data Scientist

Machine Learning Engineer

Data Analyst

Product Manager

Entrepreneur

Business / Consulting

Research / Academic

Student

Other

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Join Team Success

You have successfully joined undefined

You now have access to all Pro features. Click below to start learning!

Session Expired

Session expired — please return to Cornerstone to restart the session and complete the course.

/

Voice for AI Agents and Applications

All Courses

/

Voice for AI Agents and Applications

All Courses

Voice for AI Agents and Applications

Voice for AI Agents and Applications

Course Syllabus

Elevate Your Career with Full Learning Experience

Unlock Plus AI learning and gain exclusive insights from industry leaders

Access exclusive features like graded notebooks and quizzes

Earn unlimited certificates to enhance your resume

Starting at $1 USD/mo after a free trial – cancel anytime

Most voice applications make you choose between speaking and clicking. But real applications need both. In this lesson, you will build a voice interactive tic-tac-toe game where your voice commands and mouse clicks work together on a single channel. All right, let's dive into the code. Okay, welcome back. In lesson one, you got the lay of the land. What a voice agent is, the difference between cascaded and real-time stacks, where voice is showing up beyond the contact center. This lesson is where we start building. So what we're going to develop a feel for in lesson two is what changes when voice doesn't live by itself. when voice actually lives inside an application. The standalone voice agent, the one that just listens and talks, that's the easy case. The hard case, and the one most of you actually want to ship, is voice that lives inside a product. A dashboard, a game, a checkout flow. The moment voice is inside a product, the agent and the UI have to stay in lockstep. For the entire interaction, not just at the start. Three things are happening constantly. The user speaks, the UI has to update to reflect what they asked for. The user clicks something. The agent has to know what happened mid-conversation. State changes on either side, the other side needs to know now, not on the next turn. If those three loops aren't fast and bidirectional, voice and UI drift apart. And the experience falls apart as well. So this whole lesson is about how to build that channel. The pattern we use to solve this, we call it Client Actions. It's just a single typed message channel between the voice agent and the app, open for the whole session. Either side can send. Either side reacts immediately. In the agent to app direction, you have things like render_widget. tell the UI to show or change something and update_state, which pushes agent-side state into the app. In the app to agent direction, you have user_event. User clicked, typed, dragged something, and app_state, which is the UI's current state sent back as context. Now, here's the important detail. Both directions ride the same WebRTC data channel that the audio rides on. So the agent and the app see each other as one system, not as two services trying to gossip through HTTP. Okay, to make this concrete, the notebook will have you build voice-driven tic-tac-toe. Two players, you with voice and click and the agent. And let me walk you through what happens in a single move. You will say something like, put my mark in the center. The agent fires place_mark with the row and column information and who the player is. The UI re-renders. center cell, now shows your mark. While the agent's thinking, you tap a corner cell using your mouse. The app sends user_move back to the agent, and the agent comes back out loud with, Nice, I'll block your diagonal. That is the entire loop. Both directions over one channel while the conversation is happening. Once you've got that shape in your head, every voice in an app pattern you'll see for the rest of your career will be a variant of this. So, this is lesson two, voice in your application. And what we're going to build in this notebook is a voice-driven tic-tac-toe game. Now, the point isn't really the game. The point is what's underneath the game. Vocal Bridge gives you a bidirectional channel we call client actions and lets the voice agent and your UI move as one system. You can talk to it, you can click on it, same surface. By the end of this notebook, you'll have that whole thing wired up. Quick word on setup before we run anything. In this deeplearning.ai environment, we've already taken care of the setup for you. Vocal Bridge account is provisioned, your API key is populated in the environment. The CLI is authenticated. You don't have to do anything. Just run the cells. But if you do want to run it on your own machine outside of this environment, here's the flow. Go to the Vocal Bridge dashboard, grab an API key, set it as the Vocal Bridge API key environment variable. install the CLI with pip install Vocal Bridge, authenticate with vb auth login, and that's it. So here's the setup code. We import the helpers, load environment variables, assert that the Vocal Bridge key is there. Let me run it real quick. There you go. A quick uh walkthrough of the helpers. So, there are going to be three helpers we will use today in this notebook. mint_token, which gets us a voice session token. vb is a thin wrapper around our CLI, and voice_widget renders the React widget inside the notebook. The helpers are there so the lesson can stay focused on the Vocal Bridge concepts and not all the boilerplate. So every voice UI agent ships with two files. There's a prompt, that's the agent's brain. It's personality, its rules. And there's an action schema that declares which messages can flow between the agent and your UI. Let's go through both of those. So we'll go through the prompt first. Two hard rules at the top, and these are load bearing. Let me call them out. RULE A and RULE B. RULE A says always end a response with speech, never with a tool call. This is because if the agent ends on a tool call, the user just hears silence and the demo's dead. RULE B, The CLIENT owns the board. You do NOT. So every time a move's made, the client sends a board_sync event with the full board. And the agent reads that as the ground truth. Those two rules, that's what keeps voice and UI locked together in this game. And here's the action schema. The right arrows here are agent_to_app. So this one, this one, this one, and finally this one. These are events that the agent fires that change the UI. So show_tic_tac_toe place_mark, user_move, and clear_ui. The left arrows, this and this, are events. in the opposite direction, app to agent. The application sends back information to the agent when the user clicks or the state of the game changes. So it's the same channel, both directions. That's the whole pattern right there. Now we go and create the actual agent on Vocal Bridge. One CLI command. Few flags I want to call out. First one is the style. We're going to use Chatty, which picks the OpenAI real-time model as the back end. It holds the conversation, it gives us good defaults out of the box. Then client actions file. This is pointing to the action schema that we just saw and it's the one that wires up the bidirectional channel. Without it, you would have voice but no UI interactions. And finally, deploy targets. We are going to use web because we want to make sure that the agent is reachable on the browser over WebRTC. Other options are phone, which gives the agent its own phone number. We'll use that in lesson four. And you can also deploy on both. So you use both as the deploy target, which is going to be both web and telephony. It's really useful for hybrid agents. For Tic Tac Toe, we just need the browser. So we're going to use an idempotent pattern here. if the agent ID is already pre-provisioned in the environment, like in a sandbox, we just reuse it. Or if you already ran the cell, we'll use the agent. Otherwise, we create a new one and save the agent ID. Means you can rerun it without spawning new agents over and over. And up next, we are going to call this helper function called mint_token. It's going to print out three things for us, the session_name, agent_mode. We are going to see openai_concierge here. It just suggests that it's using the OpenAI real-time model in the back end. and Vocal Bridge's concierge architecture. And finally, this token which we see as truncated, this is a token that will help authenticate every session with Vocal Bridge. Okay, this is where it all comes together. Three things we are going to try here once we render the widget. First, say let's play tic-tac-toe, and the agent will fire show_tic_tac_toe. That's the client action, that's the event that'll make sure that the board appears on your screen. Second, you can then start making moves. So start using your voice to make moves and the agent fires both user_move and place_mark in the same response. So your X and the agent's O land together. And finally, this application is multimodal. So you can you can use your voice, of course, but you can also use your mouse to click on a square. And once you do that, the voice agent will be aware of that click. The widget places your X locally and it'll fire user_placed_mark back to the agent with the full board. And after every move, the widget silently sends board_sync so the agent stays in lockstep with what's on the screen. So let's render it and try playing Tic-Tac-Toe. Let's play tic-tac-toe. You're X, your move. Can I take the center, please? Corner, better view. Blocking that. I will take bottom left. Okay, you got me. Nicely played. Awesome. So here we saw a multimodal interaction where I could use my voice. And you can also use a mouse to click on one of these squares. Okay, so now you've seen it work. Let me show you what actually made it work. The whole widget is built around one single React hook called useAgentActions. It gives you both directions of client actions on a single channel. So onAction will listen for events that the agent is firing. And sendAction will allow the application to send events back to the voice agent. Same hook, both directions, one single channel. And this is the piece that's actually unique to Vocal Bridge. Most voice platforms make you pick, you either speak or you react. Here, it's all one single channel. Great. So when you end the session, Vocal Bridge stores a structured log of every turn, every action. Let's pull the most recent one and see what that looks like. It's going to be again a single CLI call. It'll grab the most recent session, pretty print it for you, will see every spoken turn, every action that gets fired, and every event that comes back. This is what you would hook into for analytics, debugging, or evals, which, spoiler alert, is the whole subject of lesson five. Great job getting through this lesson. I will see you in the next one.

deco top

deco bottom

Voice for AI Agents and Applications

Sign in to continue learning

Voice for AI Agents and Applications

Beginner

1h26m

Topics

Agents

GenAI Applications

LLMOps

Collaborator

Voice for AI Agents and Applications

Introduction
Video
・
4m

Overview of Voice UI
Video
・
9m

Voice in Your App
Video with Code Example
・
10m

Voice for Your Agent
Video with Code Example
・
12m

Voice as a Tool
Video with Code Example
・
9m

Voice AI Evals
Video with Code Example
・
10m

Voice Agents in Production
Video
・
8m

Conclusion
Video
・
1m

Glossary
Reading
・
10m

(Optional) Claim Vocal Bridge Credits
Code Example
・
1m

Graded・Quiz

Course Details