AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

In this lesson, you will learn about using sentence embedding models in production and how the two different encoders the question encoder and the answer encoders are used in a retrieval pipeline such as in the RAG system. All right, let's go. Okay, so we trained the dual encoder embedding model. Now we have two encoders ready to go, the question encoder and the answer encoder. As we can see here during ingest we encode each text chunk using the answer encoder and store the resulting vector embedding into the vector database. Then, when a user issues a query, we use the question encoder to generate the query embedding vector. That vector is then used to retrieve the matching facts or text segments they're sent to the LLM as part of the RAG flow. How do we find the matching chunks of text after the question embedding have been computed? The naive approach will just compute similarity between the question embedding and all answer embeddings. But that is computationally heavy and may take too long for a real production system. Thankfully, we have quite a few computations of approximate nearest neighbors or ANN. Algorithms like HNSW, Annoy, FAISS and others. These algorithms approximate nearest neighbor searches with high accuracy but significantly lower compute time. They are widely used for this task. Most ANN algorithms are in-memory. So, when you implement this in production and for very large dataset, you have the additional requirement to implement your ANN approach using a persistent data store on disk. Let's see all this in the code. So in this notebook, we first remove the warnings and we're going to import as always, a bunch of packages we're going to use specifically, I want you to note these new to packages called DPR context encoder and DPR question encoder. We'll use those for the dual encoder we're going to load that's been pre-trained. You will also need the cosine similarity matrix function from the other lab. It's exactly the same thing. Just a helper function to compute similarity. So let's put together an example. Here we have five different potential answers and a question. What is the tallest mountain in the world? Now you can take this model called all-miniLM-L6-v2, which is a pure similarity model. Compute the question embedding of the question. and then compute the answer embedding for each of the answers. And then compute the similarity between the question and each of these answers. When you do this, you will see that the best answer, the one that's closest in terms of similarity, is the same one that is the question "what is the tallest mountain in the world?" And the similarity is actually 1.0. That's what we would expect. Contrast that with a Dual-Encoder that has a different answer encoder and question encoder. In this case we use the DPR fully pre-trained model for this purpose. After loading the model, we can compute the tokens of the question again. And, embedding of the question. And as you can see here is some of the embeddings printed here. And the size of the embedding is 768 dimensional vector. So now you can do the same with the answers. You take each answer, tokenize it. Use the answer encoder in this case to get the answer embedding. Compute the similarity and figure out what the best answer is. And drum roll. Yay! We got the right answer to the question that was exactly the same as the answer. This answer zero did not get the highest score. And the best answer we got is exactly what we wished for. "The tallest mountain in the world is Mount Everest." Okay. So now taking a step back, let's look at the full RAG pipeline. During Ingest, the blue lines, we take input documents or text, we chunk it and use the answer encoder to encode these chunks into embedding vectors. Then store this into the vector database. Upon receiving a user query, the green lines, we use the question and encoder to get the question embedding and then use ANN to retrieve the most relevant chunks of text. Those are then included as context in the prompt and sent to the LLM, which generates the desired response. In practice, there are a few ways you can go about building a RAG pipeline. You can coded yourself from scratch. You can use one of the do it yourself framework like LangChain or Llama index. Or you can use a RAG as a service platform like Vectara, which does most of the heavy lifting for you. So in this lesson you saw how to use the question encoder and answer encoder in a production RAG pipeline. And the importance of an ANN algorithms make the latency of retrieval acceptable. We will conclude the course in the next lesson and explain how a two-stage retrieval pipeline works. See you there.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

AI is the new electricity and will transform and improve nearly all areas of human lives.

Learn Code

Next Lesson

Embedding Models: from Architecture to Implementation

Introduction

Introduction to embedding models

Contextualized token embeddings

Token vs. sentence embedding

Training a dual encoder

Using embeddings in RAG

Conclusion

Quiz – Test your knowledge

Appendix – Tips and Help

Course Feedback

Community

0%