Multi-vector Image Retrieval

Instructor: Kacper Łukawski

Enroll for Free

All Courses
Short Course
Multi-vector Image Retrieval

Intermediate
7 Video Lessons
6 Code Examples
Instructor: Kacper Łukawski
Qdrant

What you'll learn

Understand multi-vector retrieval and how representing images as collections of patch-level embeddings enables more accurate search than single-vector methods.
Explore the ColPali model for image retrieval and apply optimization techniques like quantization and pooling to reduce memory usage while maintaining search quality.
Build a production multi-modal RAG system combining ColPali retrieval with the MUVERA model’s efficient search to handle complex documents mixing text and images.

About this course

Join our new short course, Multi-Vector Image Retrieval! Learn from Kacper Łukawski, Senior Developer Advocate at Qdrant.

Most retrieval systems represent each image with a single vector, but multi-vector techniques represent images as collections of smaller vectors—one for each patch. This detailed representation enables fine-grained matching between text query tokens and image patches, delivering higher-quality search on complex documents that combine text, images, and diagrams.

In this course, you’ll learn multi-vector retrieval concepts, implement ColPali for image search, and apply optimization techniques to make these systems production-ready. You’ll work with real course materials to build a complete multi-modal RAG system.

In detail, you’ll:

Learn multi-vector text retrieval with ColBERT – Understand how multi-vector retrieval works by implementing ColBERT for text, then explore the computational and memory challenges of late interaction search.
Implement ColPali for multi-vector image retrieval – Apply multi-vector concepts to images using ColPali, which adapts vision language models to create patch-level embeddings for fine-grained visual search.
Optimize ColPali’s memory footprint – Apply scalar and binary quantization, plus row, column, and hierarchical pooling techniques to dramatically reduce memory usage while preserving search quality.
Enable fast search with MUVERA embeddings – Convert multi-vector representations into high-dimensional single vectors using MUVERA, unlocking HNSW search for significantly faster retrieval.
Build a multi-modal RAG system with ColPali – Combine ColPali retrieval, memory optimizations, and MUVERA search into a complete RAG pipeline that retrieves and reasons over visual documents.

Start building retrieval systems that understand images at the patch level and deliver accurate multi-modal search.

Who should join?

AI builders working with multi-modal data who want to implement advanced image retrieval. Familiarity with Python and vector embeddings is recommended.

Course Outline

7 Lessons・6 Code Examples

Introduction
Video
・
3 mins

Multi-vector Text Retrieval: ColBERT
Video with Code Example
・
17 mins

Multi-vector Image Retrieval: ColPali
Video with Code Example
・
15 mins

Optimizing retrieval with multi vector representations
Video with Code Example
・
15 mins

MUVERA Embeddings
Video with Code Example
・
18 mins

Building multi-modal RAG with ColPali
Video with Code Example
・
11 mins

Conclusion
Video
・
1 min

Optional: Hands-On Project
Code Example
・
10 mins

Quiz

Graded・Quiz

・

10 mins

Instructor

Kacper Łukawski

Developer Relations Lead at Qdrant