DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

💻   Accessing Utils File and Helper Functions

In each notebook on the top menu:

1:   Click on "File"

2:   Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


💻   Downloading Notebooks

In each notebook on the top menu:

1:   Click on "File"

2:   Then, click on "Download as"

3:   Then, click on "Notebook (.ipynb)"


💻   Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


📗   See Your Progress

Once you enroll in this course—or any other short course on the DeepLearning.AI platform—and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


📱   Features to Use

🎞   Adjust Video Speed: Click on the gear icon (⚙) on the video and then from the Speed option, choose your desired video speed.

🗣   Captions (English and Spanish): Click on the gear icon (⚙) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

🔅   Video Quality: If you do not have access to high-speed internet, click on the gear icon (⚙) on the video and then from Quality, choose the quality that works the best for your Internet speed.

🖥   Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

√   Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


🧑   Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

🧑   Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

📅   Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

☕   Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

💬   Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

✍   Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


📚   Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. 👇

👉👉 🔗 DeepLearning.AI – All Short Courses [+]


🙂   Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community 👉👉 🔗 DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Large Multi Modal Model or LMM Prompting with Gemini built in partnership with Google Cloud. Imagine you're designing a customer service app and a customer uploads an image of the product, let's say a microwave next to a sweet potato and ask what I do with this. And LMM lets you answer this question directly using the text and the images. Before LMM's became available, one approach might've been to use a captioning model to write a description of the image, then feed that caption and the question into a Large Language Model or LLM. But an LMM Large Multimodal Model can process text and images directly, thus reducing the chance of say, the caption missing some critical detail. Gemini is one of the latest and few models that has been trained from the ground up to understand a mixture of text, images, audio, and video. I'm delighted to introduce the instructor for this course, Erwin Huizenga, who is a developer advocate in machine learning at Google Cloud, and his deep experience with LLMs and LMMs. Thanks, Andrew. I'm excited to work with you and your team on this. In this course, you'll learn how to build multimodal use cases. Specifically, you'll learn what is multimodality. How to use the Gemini API with different types of data like images and video. As well as best practices about setting your parameters and prompt engineering and how to apply advanced reasoning across multiple images or videos. For example, one of the use cases you see is inputting a document with both text and graphs, and then getting LLM to answer questions that depend on reading and understanding both the text and the image of the graph. You use Python and the Vertex AI Gemini to build these multimodal use cases. You will explore various multimodal use cases and learn how to interact with images, including those containing text or tables, in videos using Gemini models. You'll learn to choose model parameters and understand how these can influence the model's creativity and consistency. You'll discover best practices for promoting multimodal content and use LLM's to refine, edit, and enhance videos similar to what a digital marketer needs when preparing content for social media. Additionally, you'll learn how to enhance language models with real-time data, integration through function calling. Many people have worked to create this course. I'd like to thank on the Google Cloud side, Polong Lin Lavi Nigam and Thu Ya Kyaw and from DeepLearning.AI Eddy Shyu, also contributed to this course. In the next video, Erwin will give an introduction to Multimodality and Gemini. And after you finish this course, whenever you have both text and image data, I hope that you will develop applications so quickly using the ideas from this course, that others will look to you as a model of efficiency. Let's go on to the next video and get started. This course is presented in a video-only format. You can simply watch the course to learn all about Gemini. If you wish to run the code yourself, we provide you with the instructions on how to access and run the notebooks. Let me show you where those instructions are. Down here on the bottom-left, you can click on how to set up your GCP account. This takes you to this document. And, in this document you will find instructions on how to sign up for the Google Cloud Platform account. You can also find instructions on how to access Google Colab notebooks. Now on to the course.
course detail
Next Lesson
Week 1: Large Multimodal Model Prompting with Gemini
  • Introduction
    Video
    ・
    3 mins
  • Introduction to Gemini Models
    Video
    ・
    11 mins
  • Multimodal Prompting and Parameter Control
    Video
    ・
    26 mins
  • Best Practices for Multimodal Prompting
    Video
    ・
    10 mins
  • Creating Use Cases with Images
    Video
    ・
    20 mins
  • Developing Use Cases with Videos
    Video
    ・
    26 mins
  • Integrating Real-Time Data with Function Calling
    Video
    ・
    18 mins
  • Conclusion
    Video
    ・
    1 min
  • How to Set Up your Google Cloud Account | Try it out Yourself [optional]
    Resource
    ・
    10 mins
  • Gemini Course Feedback [optional]
    Resource
    ・
    10 mins
  • Course Feedback
  • Community