DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

DLAI Logo
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Large Multi Modal Model or LMM Prompting with Gemini built in partnership with Google Cloud. Imagine you're designing a customer service app and a customer uploads an image of the product, let's say a microwave next to a sweet potato and ask what I do with this. And LMM lets you answer this question directly using the text and the images. Before LMM's became available, one approach might've been to use a captioning model to write a description of the image, then feed that caption and the question into a Large Language Model or LLM. But an LMM Large Multimodal Model can process text and images directly, thus reducing the chance of say, the caption missing some critical detail. Gemini is one of the latest and few models that has been trained from the ground up to understand a mixture of text, images, audio, and video. I'm delighted to introduce the instructor for this course, Erwin Huizenga, who is a developer advocate in machine learning at Google Cloud, and his deep experience with LLMs and LMMs. Thanks, Andrew. I'm excited to work with you and your team on this. In this course, you'll learn how to build multimodal use cases. Specifically, you'll learn what is multimodality. How to use the Gemini API with different types of data like images and video. As well as best practices about setting your parameters and prompt engineering and how to apply advanced reasoning across multiple images or videos. For example, one of the use cases you see is inputting a document with both text and graphs, and then getting LLM to answer questions that depend on reading and understanding both the text and the image of the graph. You use Python and the Vertex AI Gemini to build these multimodal use cases. You will explore various multimodal use cases and learn how to interact with images, including those containing text or tables, in videos using Gemini models. You'll learn to choose model parameters and understand how these can influence the model's creativity and consistency. You'll discover best practices for promoting multimodal content and use LLM's to refine, edit, and enhance videos similar to what a digital marketer needs when preparing content for social media. Additionally, you'll learn how to enhance language models with real-time data, integration through function calling. Many people have worked to create this course. I'd like to thank on the Google Cloud side, Polong Lin Lavi Nigam and Thu Ya Kyaw and from DeepLearning.AI Eddy Shyu, also contributed to this course. In the next video, Erwin will give an introduction to Multimodality and Gemini. And after you finish this course, whenever you have both text and image data, I hope that you will develop applications so quickly using the ideas from this course, that others will look to you as a model of efficiency. Let's go on to the next video and get started. This course is presented in a video-only format. You can simply watch the course to learn all about Gemini. If you wish to run the code yourself, we provide you with the instructions on how to access and run the notebooks. Let me show you where those instructions are. Down here on the bottom-left, you can click on how to set up your GCP account. This takes you to this document. And, in this document you will find instructions on how to sign up for the Google Cloud Platform account. You can also find instructions on how to access Google Colab notebooks. Now on to the course.
course detail
Next Lesson
Large Multimodal Model Prompting with Gemini
  • Introduction
    Video
    ・
    3 mins
  • Introduction to Gemini Models
    Video
    ・
    11 mins
  • Multimodal Prompting and Parameter Control
    Video
    ・
    26 mins
  • Best Practices for Multimodal Prompting
    Video
    ・
    10 mins
  • Creating Use Cases with Images
    Video
    ・
    20 mins
  • Developing Use Cases with Videos
    Video
    ・
    26 mins
  • Integrating Real-Time Data with Function Calling
    Video
    ・
    18 mins
  • Conclusion
    Video
    ・
    1 min
  • How to Set Up your Google Cloud Account | Try it out Yourself [optional]
    Resource
    ・
    1 min
  • Gemini Course Feedback [optional]
    Resource
    ・
    1 min
  • Course Feedback
  • Community
  • 0%