DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

DLAI Logo
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
In this final audio lesson, we'll tackle text-to-audio generation by converting text to speech. Text-to-speech is a challenging task because it is a one-to-many problem. In classification, you have one correct label, maybe a few. In automatic speech recognition, there's one correct transcription for a given utterance. However, there's an infinite amount of ways to say the same sentence. Each person has a different way of speaking, but they are all valid and correct. Think about different voices, dialects, speaking styles, and so on. Despite these challenges, there are open-source models that can handle this task really well, and you're about to use one of them. We'll use a VITS pre-trained model from Kakao Enterprise. This is one of the two models that can fit in this environment. And this model has a permissive license. Once you have the pipeline, all you need to do is to pass some text to it. Let's write some text. Now let's pass this text to the pipeline. Let's give it a listen. Researchers at the Allen Institute for AI are going to face Microsoft. The University of Washington, Carnegie Mellon University, and the Hebrew University of Jerusalem developed a tool that measures atmospheric carbon emitted by cloud servers while training machine learning models. After a model's size, the biggest variables were the server's location and time of day it was active. And just like that, you can convert text into an aerated audio recording. Feel free to paste your text into your computer. Feel free to paste your own text and play with the pipeline. In the next lesson, Yunus will show you how to build an object detector. Let's go on to the next lesson.
course detail
DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.
LearnCode
Next Lesson
Open Source Models with Hugging Face
  • Introduction
    Video
    ・
    5 mins
  • Selecting models
    Video
    ・
    5 mins
  • Natural Language Processing (NLP)
    Video with Code Example
    ・
    9 mins
  • Translation and Summarization
    Video with Code Example
    ・
    5 mins
  • Sentence Embeddings
    Video with Code Example
    ・
    5 mins
  • Zero-Shot Audio Classification
    Video with Code Example
    ・
    9 mins
  • Automatic Speech Recognition
    Video with Code Example
    ・
    15 mins
  • Text to Speech
    Video with Code Example
    ・
    2 mins
  • Object Detection
    Video with Code Example
    ・
    11 mins
  • Image Segmentation
    Video with Code Example
    ・
    16 mins
  • Image Retrieval
    Video with Code Example
    ・
    7 mins
  • Image Captioning
    Video with Code Example
    ・
    5 mins
  • Multimodal Visual Question Answering
    Video with Code Example
    ・
    4 mins
  • Zero-Shot Image Classification
    Video with Code Example
    ・
    6 mins
  • Deployment
    Video with Code Example
    ・
    11 mins
  • Conclusion
    Video
    ・
    1 mins
  • Course Feedback
  • Community
  • 0%