Welcome to Building AI Voice Agents for Production taught by Russ d'Sa (Sah), Shayne Parmelee and Nedelina Teneva. Russ is the co-founder and CEO and Shane a developer advocate at LiveKit. And Nedelina is head of AI, at RealAvatar and developed a conversational avatar together with the DeepLearning.AI team. RealAvatar is also a portfolio company of AI Fund, which I lead. I'm excited about agents who can converse with users. This is turning out to be an important way for people to interact with AI agents. DeepLearning.AI and RealAvatar's teams, including Nedelina, had a great experience building a conversational avatar using LiveKit. I personally also really enjoyed using LiveKit for various other projects. In this course, we want to share with you some best practices for building voice agents. Let me describe our project, which we'll use as a running example. We started with a conversational agent, similar to many of the projects you may have seen or built in previous short courses. We developed an agentic workflow to get the system to try to choose words to try to say things similar to what to what I would say in different circumstances. We then added on the input side, a speech to text model to convert the user's audio speech to text for the agentic workflow to process. And then on the output side, added a text-to-speech model to take the text output and turned that into speech. That can then be read out to the user using a model from ElevenLabs for the audio generation. The model was trained to sound like me, and I think the audio turned out pretty decent. You hear later and you can decide. We wanted this to scale to a large number of users, and so we moved to a cloud infrastructure that could support many simultaneous users. Finally, users of this service can be anywhere in the world. This introduced real-time networking concerns and audio integration issues. Our solution, use cloud resources to support the front end of the avatar, with an agentic workflow on the back end and we integrate to a LiveKit to provide communication infrastructure. Now Nedelina and Russ will tell you more about this in the course. In the first lesson, you'll learn the components of a voice pipeline, including speech to text and text-to-speech models, as well as voice activity and end of turn detection. You'll also learn how important latency is and some strategies for keeping latency low. Then we'll try out a voice agent. You'll learn how voice agents are really different from other applications. Voice agents have state and to be effective, must have a presence, just as if there was another person on the other end of the conversation. In lesson four and five, you'll build a voice agent that you can use in the course or download to your own machine. You'll learn to measure latency in a voice pipeline to achieve natural conversation. Many people have worked to create this course. I'd like to thank from LiveKit Theo Monnom and from DeepLearning.AI, Geoff Ladwig. Additionally, I'd like to thank Thor Schaeff from ElevenLabs who created the speech-to-text model you'll be using in this course, and to arrange the support for this course. Thank you Andrew. It's great to be a part of this course. I hope you will find conversational AI agents as compelling as we do, and will take the time to not just explore text-to-speech with ElevenLabs, but also our fully fledged conversational AI platform, which allows you to add voice to your agents within minutes. We can't wait to see what you will build. That sounds great. Let's get started with the next lesson an overview of voice agents.