Welcome to Introducing Multimodal Llama 3.2, built in partnership with Meta and taught by Amit Sangani, who's the Director of AI Partner engineering for the Llama team at Meta. It's my pleasure to be here with you, Andrew, and to teach this course. Many of us as developers and researchers, appreciate the value of open software and models. Open models are key building blocks of AI and are key enabler of AI research. Open models, that is, models that anyone can download, customize, fine tuned, or build new applications on top of, are really important components of how AI innovation takes place. So for example, if you look on HuggingFace, you find thousands of variants of the Llama models and they represent research or different applications that others have built as a result of Meta providing open models that others could make these innovations on top of. Recently Meta launched Llama 3.2, adding new models and capabilities to the Llama family, which opens up new capabilities for developers. That's right. At Meta, we believe that openness drives innovation and is the right path forward. That is why we continue to share our research and collaborate with our partners and the developer community. In the 3.2 release, we released four new models. We added vision capability to two of the previous 3.1 models, resulting in the 3.2 11 B and 90 B models. To support edge applications, we added the smaller 1B and 3B models. This is on top of the recent 3.1 release, which included 405 billion parameter model, which is a foundation class model. One important development is that the open Llama family now has vision capabilities as well, and Llama Stack additionally provides an open source set of software to help developers built on top of the Llama models. In this course, we'll start with an introduction of the Llama family of models, how they were built and trained, and how you can use them in your applications. In addition, you might have heard of an LLMs user and assistant rules. Llama 3.2 introduces new roles, for example, the iPython rule. This is a tool calling rule and gives the Llama family built-in and user-defined function calling capabilities, which are useful for building agentic workflows. You'll get practice in the labs with the prompt format and tokenization to support this as well as many examples of tool calling. The 3.1 and 3.2 models support an expanded vocabulary of 128,000 tokens and use tiktoken tokenizer. You will also look more closely at tokenization as well as how it impacts performance. And as Andrew mentioned, you will build with the Llama Stack API. This is a standardized interface for canonical toolchain components like fine-tuning or synthetic data generation to customize Llama models and build agentic applications. Many people have worked to create this course. I'd like to thank Jeff Tang and Xi Yan from Meta. As well as Esmaeil Gargari and Geoff Ladwig from DeepLearning.AI. All right, let's set up the llamas and get started!

Please sign in to view this content

Next Lesson

Introducing Multimodal Llama 3.2

Introduction
Video
・
3 mins

Overview of Llama 3.2
Video
・
5 mins

Multimodal Prompting
Video with Code Example
・
10 mins

Multimodal Use Cases
Video with Code Example
・
14 mins

Prompt Format
Video with Code Example
・
12 mins

Tokenization
Video with Code Example
・
7 mins

Tool Calling
Video with Code Example
・
18 mins

Llama Stack
Video with Code Example
・
6 mins

Conclusion
Video
・
1 min

Appendix – Tips and Help
Code Example
・
10 mins

Course Feedback

Community