Welcome to this short course, Open Source Models with Hugging Face π€, built in partnership with Hugging Face. Thanks to open source software, if you want to build an AI application, you might be able to grab an image recognition component here, and an automatic speech recognition model there, and an LLM somewhere else, and then string them together very quickly to build a new application. Hugging Face has been transformative for the AI community in terms of making it easy for anyone to do this by making many open source models easily accessible. This has been a huge accelerator for how many people build AI applications. In this course, you'll learn directly from the Hugging Face team, how to do this and build cool applications yourself, possibly faster than you might have previously imagined would be possible. For example, you'll use models to perform automatic speech recognition or ASR to transcribe speech into text ππ Then also, text, to speech models or TTS to go the other way to convert text into audio.ππ These models combined with an LLM give you the building blocks you can use to build your own voice assistant. You'll also see how to use Hugging Face's transformer library to quickly pre-process as well as post-process outputs of machine learning models. For example, pre-processing audio, like controlling the audio sampling rate in the ASR or TTS examples I just mentioned, as well as pre-process or post-process data such as images and text. The notion of grabbing open source components to build something quickly has been a paradigm shift in how AI applications are built. In this course, you'll get a feel for how to do this yourself. I'm delighted to introduce our instructors for this course. Younes Belkada, a machine learning engineer at Hugging Face π€, has been involved in the open source team where he works at the intersection of many open source tools developed by Hugging Face, such as Transformers, parameter efficient fine-tuning or PEFT, and TRL, which stands for Transformers Reinforcement Learning. Marc Sun, also a machine learning engineer at Hugging Face π€, is part of the open source team where he contributes to libraries such as the Transformers Library and the Accelerate Library. Maria Khalusova is a member of technical staff at Hugging Face π€, and she leads the educational projects at Hugging Face and contributes to cross-library efforts to make state-the-art machine learning more accessible to everyone. Thanks Andrew, we're excited to work with you and your team on this π. First you will create your own chatbot with Open Source LLMs. π¬ You will use an open source LLM from Meta, the same code can apply to more powerful open source LLMs when you have access to more powerful hardware. You will use open source models to translate text from one language between two sentences. Next, you'll use Transformers for processing audio. π What audio tasks do you think a voice assistant might be performing when you ask it for, say, a weather forecast? It knows to wake up when you say its name. That's classification. It converts your speech to text to look up your request. That's automatic speech recognition. And it replies to you. That's text-to-speech. In this course, you'll classify arbitrary sounds, transcribe speech recordings, and generate speech from text. The computer vision applications of Transformers are plentiful. ππΌοΈ You'll learn how to detect objects in images and segment images into regions called semantic semantic areas For example, you can apply this code to detect that a puppy exists in an image and also segment segment the part of the puppy that makes up its ears. After you've learned to handle text, audio, and image tasks, you can combine these models in a sequence to handle more complex tasks. For example, if you want your app to help someone with a visual impairment by describing an image to them, how could you implement that? In this course, you'll apply object detection to identify the objects, image classification to describe those objects in text, and then speech generation to narrate the names of those objects. You'll also use a model that can be used to describe the image. You'll also use a model that can take in more than one data type as input. These are called multimodal models. πππΌοΈ For example, you'll build a visual question answering application in which you can send an image to a model as well as a question about that image, and your application can then return an answer to that question based on the image. You'll also use the Gradio library to deploy an AI application to Hugging Face spaces so that anyone can use your application to perform tasks by making API calls to the internet Of course, the goal of all of these examples isn't just for you to be able to build these specific examples, it's so that you'll learn about all these building blocks so that you'll be able to combine them yourself into your own unique applications. Many people have worked to create this course. I'd like to thank, on the Hugging Face side, the entire Hugging Face team for their review of the course content, π as well as the Hugging Face community for their contributions to the open source models. β¨ From DeepLearning.ai, Eddy Shyu had also contributed to this course π. In the first lesson, you'll learn how to navigate thousands of models on the Hugging Face Hub to find the right one for your task, and how to use the pipeline object from the Transformers library to start building your applications. That sounds super exciting. Let's go on to the next video and get started! π