In this lesson, you'll learn how to streamline the outputs from database operation by incorporating a projection stage into the MongoDB aggregation pipeline. This will effectively reduce the amount of data returned, optimizing performance and data handling. All right, let's get on with it. In the previous lesson, you modified the fields within the documents returned after vector such and other stage operations by using pydantic and specifying the attributes that you wanted in the pydantic model. In this scenario, you're not using the entire fields in the documents returned by results of the aggregation pipeline, but you are leaving it to the application layer to handle the removal of unwanted attributes or fields. This can have disadvantages such as increased network traffic and processing time, as unwanted data must still be transmitted and then filtered out at the application layer. With a MongoDB database, the inclusion or exclusion of specific fields can be handled as another stage to add to the aggregation pipeline. This is done through a technique known as projection, which outputs the same number of documents as the previous stage before it, but reduces the fields return in each document. The projection technique within MongoDB works by specifying fields to include or exclude from the final documents. For example, the document representation of the Mona Lisa painting we used in previous lesson, can be reduced to a select few fields using the project operator in MongoDB, which you will get to implement in the code section soon. There are several advantages to projection with the inclusion of projection, the overall memory usage at the application layer reduces as less data is passed as results from database operations. This can also contribute to reduce query execution time. And there is the case of security and privacy. Take for example, a finance application where personal information and sensitive data are stored in documents. It can be useful to have the database handle the logic of removing sensitive information before being sent to downstream processes. This provides an overall improved sense of security in the application. In the coding section, you will go through familiar steps to implement a RAG system, add a filter stage, but then add an additional projection stage, and then you will proceed to handle the user query. Let's code. You will start by importing the custom utils as you did in the previous lesson. Then move on to downloading the dataset from HuggingFace as you also did in the previous lesson. You also load up the listings dataset by conforming to the pydantic model defined in the customer utils, just like you did in previous lesson, and move on to connect to your database and delete records in the collection and you observe the number of collections deleted. Just like the previous lesson, you will insert a new batch of records. Here, you're using a vector search index or a filter similar to the one you created in the last lesson. This code is moved into the custom utils module and you will load it by calling the set up vector search index of filter function and passing in the collection object. This will create a vector search index that is optimized to retrieve data with the accommodates and bedroom attributes. You will start by defining a search result model similar to what you've done in the previous lesson, this time with some new attributes such as score and notes. In the next cell, you implement the handle user query function. This is similar to the function that you've created in previous lesson. The main difference for this function is that were printing out the list of fields in the first document. You're doing this by accessing the first element returned from get knowledge and iterating through the keys. You are doing this to observe the fields of documents that are allowed through the projection stage before being done limited, and pass into the search result item model. Now, you're going to implement a projection stage and add it to the additional stages that will be passed into the vector search query function. You're going to define a variable called projection stage and assign it to a projection document. A projection stage is executed on a database operation and indicated by the dollar operator and the word project. This command takes in a document that represents the field that are to be projected. One thing to note is every document returned by the aggregation pipeline, will include an underscore ID field. This is returned automatically. You can exclude it by indicating the number zero as the value for the field. This is an exclusion. To include a field in a projection, you would mention the name of the field such as accommodates and assign it to value one to include. This is an inclusion pattern. As you can observe we follow the same pattern for the fields we want to project. By including fields to project, you are automatically excluding fields that are not mentioned. Now that is all you need to do for the projection stage. But also notice you are adding a score field and assigning the value of the vector such score to that field. This is a way to get the similarity score of the vector search operation into the document return from the database operation. Now, placing the projection stage into a Python list and assign that to the additional stage variable. One more thing, these are all the fields and attributes we want in our pydantic model, which should be included in any of your projection documents. This cell will look familiar as you've used it in a previous lesson. Here, you have the user query to look for places warm friendly. And here are the main changes. In the handle user query, you'll pass in the additional stage that contains the list, which includes the projection stage. The handle user query will also print fields of the first document before the documents are processed by the pydantic model. Also, you're using the vector index with filter for this vector the search operation. After running this cell, you'll observe that the database vector search operation was executed in a fraction of a millisecond. You'll also observe the fields that are included in the document are the ones we included in the projection. We have the name, summary space and other fields that we wanted to be projected and included in the documents. The results are still the same, and we're still getting the same documents. I want to show you one thing. When conducting a projection, it's important you maintain the pattern throughout for every field indicated. Meaning, if you're conducting an inclusion, use the one pattern throughout. If you're conducting an exclusion, use a zero pattern throughout. The only exception to this rule is the underscore ID field. Let me show you an example. You can change this number to zero to represent an exclusion. What will happen is a database operation failure. As you can access, you got an operation failure, but scrolling further down you will see the reason for this. The reason for this failure is because of an invalid project document. And this was caused because you can't exclude on an inclusion projection. To fix the operation failure error, simply place the value one to follow the inclusion pattern. Now, your results are back. One thing to note is because we've included a score field that shows the vector similarity search score. We can see that in the results as well. Let's have a look. Here, you can see the field score and the attributed vector search similarity score. The vector search similarity scores between a value 0.1 with one being a very close similarity measure. You can pause the video here to observe the scores of the documents that are returned from database operation. That concludes it for this lesson. In this lesson, you went through the typical pattern of setting up a RAG pipeline with the vector search indexes and you also ingested data into a database collection. And the new thing you've learned in this lesson is you've created and added a projection stage to the aggregation pipeline to limit the fields returned from the aggregation pipeline query. In the next lesson, you will see how you can add, boosting, and improve the relevance of the vector search operation by looking at qualitative and quantitative data and using it to affect the ranking of documents returned from the aggregation pipeline. See you in the next lesson.