Welcome to Prompt Compression and Query Optimization. Built in partnership with MongoDB and taught by Richmond Alake. Richmond is a developer advocate at MongoDB and has worked as a machine learning architect and taught AI and ML for many years. Thanks, Andrew. This course shows you how to combine features of a mature, established database with vector search to reduce the cost of serving a large RAG application. Say you're building a conversational RAG application that helps users select a rental property. A user might enter a text query for one level ranch on a quiet street. You can use semantic search to find a close match to the user description. Using an embedding of the user requests and searching a vector database for homes with descriptions that match. But the user may also have hard requirements like three bedrooms, two bathrooms, and maybe no swimming pool. These are better handled with a more traditional retrieval by selecting data based on fields in the database and explicitly store the number of bedrooms, bathrooms, and so on. In this course, you learn to use the best of both worlds, a traditional database with an added vector index. In RAG applications to retrieve results that provide an LLM for final processing. If the retrieve context is very long, this results in a very long prompt and can thus be costly where retrieval to return, say 10,000 tokens. If you were to run a rental comparison website to search, say, a million queries per day, and if LLM input tokens cost $10 per million tokens, you could be spending over $36 million a year. So, to help you reduce costs, this course, will also cover ways to keep the retrieved results as small and relevant as possible. Thanks, Andrew. Let me describe some of the techniques you will learn. Let's consider your rental app filtering on the number of bedrooms or bathrooms can be done with a pre-filter or post-filter. Efficient pre-filter is done in the database index creation stage. You build a new index of entries that match common queries. So for example, if you know you frequently get queries for bedroom units, you can build an index that includes the bedroom field. So that's pre-filtering. In contrast, post filtering is done often a vector search query is performed where you then apply a filter to this result to select the sub set matching the required condition. Large scale applications may use both of these techniques simultaneously. Another technique to minimize the size of the output is something called projection, which selects a subset of the fields returned from a query. For example, out of 15 fields of a potential rental, you may want to return only three of them. Name, number of bedrooms, and price. Now, you could implement all of this operation directly in your application, but the database can optimize all this operation for performance and enforce role-based access control. So they are best accomplished there. And another powerful technique is reranking the results of a search. For example, after using the text embeddings of the renter description to perform a semantic search, you can rerank the results based on other data fields such as average star rating or number of ratings, to move the more desired results higher up the list of results. In order to then generate better context for the LLM. One final technique is prompt compression. If the retrieve information is very lengthy, seeking all this context into an LLM prompt results in a very long, prompt length, which is expensive to process. To reduce this costs, you can use a small, low cost LLM fine tuned to compress prompts before sending them to the final LLM. There are many opportunities to improve relevance and save costs. Thank you Andrew. You will learn all these techniques in the next few lesson. You will start this course by implementing a vanilla vector search and end by implementing prompts compression. Many people have worked to create this course from MongoDB. I'd like to thank Apoorva Joshi, Pavel Duchovny, Prakul Agarwal, Jesse Hall, Rita Rodrigues, Henry Weller and Shubham Ranjan, and also Esmaeil Gargari from DeepLearning.AI had also contributed to this course. I hope you enjoy this course. Please go to the next video and let's dive in.

Please sign in to view this content

Next Lesson

Prompt Compression and Query Optimization

Introduction
Video
・
4 mins

Vanilla Vector Search
Video with Code Example
・
33 mins

Filtering With Metadata 
Video with Code Example
・
19 mins

Projections
Video with Code Example
・
10 mins

Boosting
Video with Code Example
・
12 mins

Prompt Compression
Video with Code Example
・
17 mins

Conclusion
Video
・
1 mins

Appendix-Tips and Help
Code Example
・

Course Feedback

Community