DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

DLAI Logo
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
In this course, you explored an intricate dance between visual and textual data within multimodal machine learning. You have successfully harnessed advanced tools and frameworks to create intelligent systems capable of contextual understanding and nuanced responses. The BridgeTower model and the LlaVa model that was used in the course were hosted on Intel Gaudi AI accelerators. I invite you to check out Intel Tiber Developer Cloud (cloud.intel.com). Where you can access the latest hardware like Intel's Gaudi three AI accelerator to build your next project. You can also check out the detailed implementation of the BridgeTower model in HuggingFace and training recipes for the LlaVa model, released by our team. As you move forward, carry the confidence and creativity sparked by your achievements here, you ready to innovate and shape the future of AI? I'm excited to see what you will build on your own.
course detail
Next Lesson
Multimodal RAG: Chat with Videos
  • Introduction
    Video
    ・
    4 mins
  • Interactive Demo and Multimodal RAG System Architecture
    Video with Code Example
    ・
    7 mins
  • Multimodal Embeddings
    Video with Code Example
    ・
    9 mins
  • Preprocessing Videos for Multimodal RAG
    Video with Code Example
    ・
    9 mins
  • Multimodal Retrieval from Vector Stores
    Video with Code Example
    ・
    6 mins
  • Large Vision - Language Models (LVLMs)
    Video with Code Example
    ・
    7 mins
  • Multimodal RAG with Multimodal Langchain
    Video with Code Example
    ・
    13 mins
  • Conclusion
    Video
    ・
    1 mins
  • Course Feedback
  • Community
  • 0%