Learn state-of-the-art techniques for getting the most out of multimodal AI with Google’s Gemini model family.
We'd like to know you better so we can create more relevant courses. What do you do for work?
Instructor: Erwin Huizenga
Learn state-of-the-art techniques for getting the most out of multimodal AI with Google’s Gemini model family.
Leverage the power of Gemini’s cross-modal attention to fuse information from text, images, and video for complex reasoning tasks.
Extend Gemini’s capabilities with external knowledge and live data via function calling and API integration.
Multimodal models like Gemini are pushing the boundaries of what’s possible by unifying traditionally siloed data modalities. With Gemini, you can build applications that seamlessly understand and reason across text, images, and videos, enabling a new class of intelligent systems. For example, building a virtual interior designer that can analyze a user’s room images, understand their style preferences from a text description, and generate personalized design recommendations. Or creating a smart document processing pipeline that can extract structured data from complex PDFs, answer questions based on the content, and generate human-like summaries.
You’ll learn prompt engineering techniques to guide Gemini’s behavior and optimize its performance for diverse use cases, from creative story generation to analytical report writing. And you’ll discover how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.
What you’ll learn, in detail:
Through this course, you’ll become well-versed in Gemini’s capabilities, how to maximize them in different use cases, and a portfolio of practical techniques for architecting advanced multimodal AI applications.
Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.
Whether your goal is to build next-gen document understanding systems, intelligent video search tools, or interactive virtual assistants, this course will equip you with the skills to develop transformative applications.
Introduction
Introduction to Gemini Models
Multimodal Prompting and Parameter Control
Best Practices for Multimodal Prompting
Creating Use Cases with Images
Developing Use Cases with Videos
Integrating Real-Time Data with Function Calling
Conclusion
How to Set Up your Google Cloud Account | Try it out Yourself [optional]
Gemini Course Feedback [optional]
Course access is free for a limited time during the DeepLearning.AI learning platform beta!
Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!