Short CourseIntermediate

Multi-vector Image Retrieval

Instructor: Kacper Ɓukawski

Qdrant logo
  • Intermediate
  • 7 Video Lessons
  • 5 Code Examples
  • Instructor: Kacper Ɓukawski

What you'll learn

  • Understand multi-vector retrieval and how representing images as collections of patch-level embeddings enables more accurate search than single-vector methods.

  • Explore the ColPali model for image retrieval and apply optimization techniques like quantization and pooling to reduce memory usage while maintaining search quality.

  • Build a production multi-modal RAG system combining ColPali retrieval with the MUVERA model’s efficient search to handle complex documents mixing text and images.

About this course

Join our new short course, Multi-Vector Image Retrieval! Learn from Kacper Ɓukawski, Senior Developer Advocate at Qdrant.

Most retrieval systems represent each image with a single vector, but multi-vector techniques represent images as collections of smaller vectors—one for each patch. This detailed representation enables fine-grained matching between text query tokens and image patches, delivering higher-quality search on complex documents that combine text, images, and diagrams.

In this course, you’ll learn multi-vector retrieval concepts, implement ColPali for image search, and apply optimization techniques to make these systems production-ready. You’ll work with real course materials to build a complete multi-modal RAG system.

In detail, you’ll:

  • Learn multi-vector text retrieval with ColBERT – Understand how multi-vector retrieval works by implementing ColBERT for text, then explore the computational and memory challenges of late interaction search.
  • Implement ColPali for multi-vector image retrieval – Apply multi-vector concepts to images using ColPali, which adapts vision language models to create patch-level embeddings for fine-grained visual search.
  • Optimize ColPali’s memory footprint – Apply scalar and binary quantization, plus row, column, and hierarchical pooling techniques to dramatically reduce memory usage while preserving search quality.
  • Enable fast search with MUVERA embeddings – Convert multi-vector representations into high-dimensional single vectors using MUVERA, unlocking HNSW search for significantly faster retrieval.
  • Build a multi-modal RAG system with ColPali – Combine ColPali retrieval, memory optimizations, and MUVERA search into a complete RAG pipeline that retrieves and reasons over visual documents.

Start building retrieval systems that understand images at the patch level and deliver accurate multi-modal search.

Who should join?

AI builders working with multi-modal data who want to implement advanced image retrieval. Familiarity with Python and vector embeddings is recommended.

Course Outline

7 Lessons・5 Code Examples
  • Introduction

    Video・3 mins
  • Multi-vector Text Retrieval: ColBERT

    Video with Code Example・17 mins
  • Multi-vector Image Retrieval: ColPali

    Video with Code Example・15 mins
  • Optimizing retrieval with multi vector representations

    Video with Code Example・15 mins
  • MUVERA Embeddings

    Video with Code Example・18 mins
  • Building multi-modal RAG with ColPali

    Video with Code Example・11 mins
  • Conclusion

    Video・1 min
  • Quiz

    Graded・Quiz

    ・10 mins

Instructor

Kacper Ɓukawski

Kacper Ɓukawski

Developer Relations Lead at Qdrant

Additional learning features, such as quizzes and projects, are included with DeepLearning.AI Pro. Explore it today

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!