This lesson covers vector search and expands on RAG implementation. You explore MongoDB and pydantic a Python library crucial for data integrity. Understanding these tools will elevate the quality of your AI projects. Let's dive in. You understand that there are vast amount of data on the internet right now, and there are few ways to compare the similarity or how closely one data point is to another. A common method is text search, where you match a query keyword with a parts of the content of a data point to compute a match. This is information retrieval at the basic sense, where you input a keyword or a search term, match against several data points, and detect if the keyword is in the content. And now you are about to learn how to retrieve data based on their context or meaning. The first step is to gather data. This can be structured data like data organizing tables or spreadsheet with defined columns or unstructured data such as audio and image data. The next step involves passing the data as input to an embedding model. The output of an embedding model is a vector. At this point, you can say that the initial data has been vectorized, and you're left with the numerical representation of the data that captures the context and semantics of the data. This is referred to as vector embedding. In a high dimensional space, referred to as a vector space, you can compare the distance between two embedding vectors or more to get an indication of how closely they are similar in semantics or context. So far you understand vector search and information retrieval technique that uses numerical representation of data known as vector embeddings to search and retrieve information. You also understand that traditionally, information retrieval relies on keyword matching, which searches for direct matches between query text and the text within the data set. However, vector search makes use of this embeddings to enable advanced functionality such as semantic search, which understands the context of the query. Recommendation systems that predicts user preference. And Retrieval Augmented Generation, or RAG, which provides additional context for LLM inputs. These capabilities make vector search a powerful tool in various AI applications. Once data, both structured and unstructured, has been collected and encoded into vector embeddings, there is a requirement to store the vectorized data into a specialized data store, referred to as a vector database. Within a vector database, to ensure efficient retrieval of vector data based on vector search queries, it is best practice to index the vector data. A vector search index is a specialized structure that optimizes the storage and retrieval of vector embeddings, allowing for efficient similarity searches. So when a vector search operation is performed, the index facilitates the efficient matching of the query vector against the data set, reducing the time needed to find the most similar vectors. And that takes you down the road of search, specifically vector search in retrieval augmented generation system. Retrieval augmented generation or RAG, is a system design pattern that leverages information retrieval techniques, including vector search and foundation models, to provide accurate and relevant response to user queries. RAG achieved this by retrieving semantically similar data to supplement user queries with additional context, and then combining the retrieved information with the original query as input into large language models. For example, a typical process using a chat interface would be you enter your chat and then you get a response from the LLM. This is not the ideal process, as this doesn't use any relevant data. The ideal process would be, with the input to the LLM, then you add in relevant domain specific data, and the large language model can provide relevant and context-aware response to your query. Now that you have an understanding of RAG, let's get an overview of the key benefits of RAG design pattern for LLM applications. Building AI application that leverages RAG system design pattern provides a number of benefits, such as grounding the LLM response in relevant and up-to-date information, which will reduce the chances of hallucinations when the LLM essentially provides wrong information or irrelevant information. With retrieval augmented generation, you also have the benefit of reducing the amount of information that is passed as input into the LLM. This can reduce the context you pass into the context window. With RAG, you also removed the need for fine tuning LLMs in some scenario, but more specifically using retrieval augmented generation, you can utilize your own private data or domain-specific data to ensure that LLM responses meet your specific requirements and needs. Now that you know, at the LLM, give better answers when supplemented with relevant context. You may wonder where and how to store this data. You may also ask, "How do I implement vector search for information retrieval in the first place?" That's where MongoDB comes in. MongoDB is a developer data platform that offers a NoSQL database with vector such functionalities. In your AI applications, MongoDB can act as a storage solution for vector data acting as a vector database. MongoDB offers even more functionality to act as a data store for operational and transaction data, making it a robust solution as a memory provider for LLM and AI applications, which include RAG and agentic systems. You're likely familiar with traditional relational database. Let's use store in data on a house to illustrate how a relational database works. In a typical relational database, you might have the information of the house, such as the number of rooms and bathrooms on one table and the address information of the house in another. With the document model, you model data based on the interaction that happens on the application and not the other way around. For what is a document in MongoDB, a document is a basic unit of data that is similar to json. Each document is a set of key-value pairs, which is the MongoDB equivalent of a row in a relational database. Let's see this in the house example we talked about earlier where we had the house details and its address attributes. In this example we have all the attributes allocated to a house in one document, including its address. This is an example of a document in MongoDB. Documents are dynamic, meaning they can contain varied fields and structures within the same collection. And the collection in a non-relational database is similar to a table in a relational database. The document model uses a json schema, which is a core data model across layers of the tech stack. For example, json helps transfer the data between website parts and Rest APIs in the application layer and used for function call in in tool definition in the model layer. When implementing an agent. MongoDB enables flexible field and valid data storage with the ability to store different data types. To ensure your documents are structured properly. You should consider data modeling. Data modeling involves designing the structure of documents and collection to effectively represent and organize data. It's about planning how to store and link data across documents to optimize for performance, scalability, and specific data access patterns of your application. At times, the layout of components of your application is dictated by the structure and format of your data in a database. In the diagram here, this is represented by the directional arrow coming from the data to the application layer. This represents implementing the application layer based on the information in the data layer. But ideally, you want to start with the needs of your application first and not the data itself. You have to ask "How would I access my data?" And that should determine how you will model the structure of your data. MongoDB enables you to use a familiar understanding of pipelines, which is present in data processing or machine learning concepts. You can apply the concepts of pipelines to ideas within a database layer. When conducting queries using MongoDB, you construct an aggregation pipeline. You can think of an aggregation pipeline as a sequence of data processing stages, where each stage transforms the data as it passes through. This process allows for complex query composition within MongoDB, as we have various stages of data transformation occurring within the pipeline. Here's an example of an aggregation pipeline query. By the way, a query is just a fancy way of describing how to tell the database to produce the specific information you're looking for. Let's say you're managing data from a social media application with a collection of user posts. You want to find the most popular posts defined by the number of likes in January 2021. And perhaps you're interested in summarizing the average number of comments and likes per posts by category. This aggregation pipeline fills this post from January 2021, groups them by category, calculates the average likes and comments, and sorts the results by the average likes in descending order. By using the aggregation pipeline, you can leverage your understanding of sequential operation from if a machine learning and AI pipelines, and apply a similar logic to managing and analyzing data in MongoDB, making complex queries quite understandable and manageable. In AI application, there is a need for data validation and ensuring that data conforms to a certain model. This reduces the likelihood of having errors in production system. Pydantic is a Python library used for data validation, modeling and management. Pydantic offers features that enables the creation of data schemas that include a definition of the object and its properties. Pydantic also ensures that data conforms to defined schemas, data type, formats and constraints. If a data schema doesn't meet the validation criteria, Pydantic handles the error by raising an exception that details the specific validation issues. Before we dive into coding, let's review the data set. It consists of 5000 Airbnb listing hosted on HuggingFace featuring details like address, description, transportation reviews and comments. For this course, you will use it to build an Airbnb listing recommendation system using RAG techniques. Each record or data point includes image embeddings of the listing, photos and text embeddings from the content of the space attribute. The information and the space attribute has been processed by the OpenAI text-embedding-ada-002 model. Here are the steps we're going to take in the coding section for this lesson. You are going to load the data from HuggingFace. Then you will set up a database connection to access the database and the collection, which you will then insert data or ingest data into the collection. And then you will conduct a vector search query using a query embedding and the embedding within the collection. The last step will handle the user query and visualize any responses. Let's dive in. Before we get to the steps that we outlined in the slides, let's see what you would build in this lesson. In this lesson, you will be building a RAG recommendation system that uses vector search to pull relevant results from a vector database to add as additional context to an LLM. As you can see on the screen. You will also observe the execution time of the vector search query and the user question under system response. The system response is the response from the LLM, and it will include a recommendation listing from the data set to have provided it as additional context And the reason for choosing this recommendation. You will also observe a table of the attributes of the data that was used as additional context, which will include the name, accommodates, address. And this will be shown for all information retrieved from the vector search query. Let's get started. These are libraries you use for this notebook which are pre-installed and available for you on the Deep Learning platform. Here, you will import the OS model and load the environment variable which you have loaded within your development environment. We will load the Mongo URI and the Open AI API key. These have been previously done for you on the development environment. The first step is to load the data set. Here, we'll import the load data set module from the data sets library from HuggingFace, which allows us to access data set from the HuggingFace platform by specifying the path. You will also import the pandas library and specify as PD, which allows conduct data modification and analysis. The first step is to call the load data set function by passing it the path to the data set. In this case, this is the Airbnb embedding data set we spoke about earlier, that contains the text embeddings. You will set the streaming to true and use the training partition of this data set. The output of this operation will be assigned to the variable data set. By calling take on the data set object and specifying the number of data points you want to extract from the data set. You can load a specified amount of data points into your environment. The next line converts the data sets into a pandas dataframe. This allows for analysis and data modifications. The final step is to view the first data points in this data set. As you can see on the screen, we can visualize the first five data points and their attributes, including the values. Pause the video here and take some time to familiarize yourself with the values of each data points. To continue with the visualization of our data set. And as data points, we will visualize the attributes of each data point. Here, you can see the various attributes that are captured in each data point within the data set, including the text embeddings. The next step is to conduct document modeling using Pydantic. First we'll import several modules from Pydantic and also the date time module from Python. In this lesson, you explore the full extent of the code, but in next lessons, you will shorten the code with a new tools function where all the extensive code will be placed in, and you can call within the notebooks. In the modeling step, the first step is to create a class host that essentially represents or defines a creator of a listing. We have attributes such as the host ID, the name, location, and response time. The next models, we will create are the location and the address. These are used to model the location and address data in our data set. And ensure that they are there conform to the type and to the data presence. You would then create another model for the review. This model will essentially hold the date of a review, the lesson ID assigned to the review, and review ID and name and any comments. The final model you will create is the parent model. This will be the listing model that will assign all the previously created model to attributes within this model. This model will also contain its own attributes such as name, summary, description, transit, and other attributes. This is the key model that holds the information of a listing an Airbnb listing. Now that you have created the models for each data point in a data set to conform to, you will now convert them into the appropriate data types. This line converts each data point into a Python dictionary and assigns it to a variable called records. Now, records holds all your listings from the data set. To ensure there are no null values, you will conduct a sanity check and replace any null values with a non. For the final step in the data modeling process, you will convert each listing data points into a dictionary and assign it to a listing variable. You'll also print out the first instance or element within the listings data sets to observe the attributes for each listing. As you can see on the screen, each listing has a name, summary, space, and other attributes. Pause the video here and take some time to familiarize yourself with the attributes. The next step is to create your database and connect to your database cluster. This is a crucial step. For the database creation and connection step. The first step is to import the libraries. You will import the libraries Mongo clients from pymongo and search index model from PyMongo operations module. Mongo client to allow us to create a client instance and a search index model will allow us to define a vector search index in the appropriate format. For the next step, you assign the database and collection name. The database will be called airbnb_data_set, which should be assigned to the variable database name. The collection will be called Listings Review, which you'll be assigned to the variable collection name. Now, you define a function called Get Mongo client, which takes in the Mongo URIs string. This is a string that represents a connection to your cluster. the Get Mongo client function uses the Mongo client constructor taken in the Mongo URI as it's argument and the app name to create an object that represents a connection to the database cluster. Once a successful connection is made, this function will return the client object. Once you've created a get Mongo client function. In the next cell you will use the function, but first conduct a sanity check to ensure you have the Mongo URI within your development environment. You will pass the Mongo URI into the Get Mongo client function. The results from the Get Mongo client will be assigned to a variable called Mongo client. The Mongo client object provides you with the method Get database, which provides a database object which allows you to access the collection by calling the method get collection on the database object. Running the cell will show a successful Mongodb connection. The last step in the database creation in connection stage, is to clean any existing collection. The first time you run this function, the result of this will be zero because the collection has just been created. In future lessons, you will need to clean the collection and you will see records being deleted. The next step is a data ingestion step. For data ingestion, MongoDB provides a function that makes ingesting data into a MongoDB collection a trivial process. Simply call the insert many function on the collection object and pass in the list and collection. Once this cell is completed, you should get a successful indicator that the data ingestion has been completed. The next step is to create the vector search index. This is a crucial step. Remember, the index allows for efficient information retrieval from the vector database. First assign to the variable text embedding field name, the name text embeddings. Text embeddings is the field that holds the vector embedding of the spaces attribute within each document in the collection. Next, assign to the variable vector search index name text the string vector index text. Vector index text is the name of your vector search index, and this will be referenced every time you make a vector search query. Now, you can use the search index model to create an appropriate definition of the vector search index and assign it to the variable vector search index model. In the cell you will be creating your vector search index using the search index model. The result of this function will be assigned to the variable vector search index model. The search index model constructor takes into his argument a definition of your vector search index. The mapping specifies how the fields are going to be indexed within the database. The dynamic field indicates to the database to index new fields that appear in the documents. The fields attribute corresponds to the indication of which field in a document holds the vector embedding. The text embedded field name is the variable that holds the string representation of the text embeddings The dimension holds the value, which indicates the size of a single vector embeddings within our documents. The field similarity indicates the distance function algorithm used to compute the similarities between two vectors. The type knnvector indicates to the database the type of the data stored is a vector. The last argument passed into the constructor is the name. This will allow the database to identify the vector search index created by the given name Vector index text. In the next cell, you conduct a check to ensure that the vector search index name selected doesn't already exist. This is good practice. Before creating any vector search index definitions. Now, you call the create search index function on the collection object to create the vector search index. This is conducted if the index doesn't already exist. You will observe in the screen an indication that the index was created successfully. Before moving on to the next cell, you can wait a minute to allow the vector index to be initialized. The final step in this process is to define a function called get embedding. The function get embedding takes in a text, which is the user query that's entered into the recommendation engine. We conduct a sanity check to ensure the text entered into the get embedding function is a string, and then you call the embeddings dot create function from the OpenAI client to generate an embedding for a single data point. The get embedding function returns a numerical vector representation of the text that was passed into the function. The next step is to compose a vector search query. You start by defining a function called vector search. This function will take in the user query, the database object, collection object, and has a vector index defined. This vector index argument has a default value, which corresponds to the name of the vector index created earlier. The first process inside the vector search function is to transform the user query into a numerical vector representation and assign it to the query embedding variable. You conduct a sanity check to ensure the query embedding is not empty before moving on to other processes within this function. The next step is to define the vector search stage. This will be the stage responsible for conducting the vector search operation that compares vector embeddings, and computes the distance. Assigned to a variable called vector search stage a json document that represents the vector search index query you are constructing. In MongoDB, operators are represented using the dialog command. And operator here is a vector search stage, so this document represents a query for a vector search operation. The index field points to the name of the vector index to utilize for the query. The query vector takes in the query embedding, which is a user query which should be used to compute the distance with other candidate vectors from the database. The part field specifies the field where the vector embedding is held within the documents. The number of candidates, or the amount of documents you want the vector search operation to consider. The limit field constrains the vector operation output to just 20 results. The next step is to define our pipeline. In MongoDB, a pipeline can be constructed by using a Python list and passing in the stages that are defined earlier. To create our pipeline for vector search function, you have the variable pipeline, which takes in a list that includes the vector search stage. The next step is to execute the aggregation pipeline. To do this, call the aggregate function on the collection object and pass in the pipeline created previously. The result of this pipeline will be assigned to the variable results. For the final step of this vector search function, you will compute how long it takes for the vector search operation to complete a millisecond. This is done by accessing and passing into the command method on the database object, the pipeline, the collection name, and an indicator to explain the execution of the command passed into the pipeline. This will provide you with an object that includes the execution stats of the vector search operation. The next lines extract the key information and prints the information onto the notebook. The final step in the vector search function is to return the list of the results. This is a last step of this lesson where you will handle the user query and conduct all the function you've defined earlier. To ensure the search results or the document returned from the database meets a specific format, you use Pydantic to define a search result. It's a model. Each result will take on the name, accommodate, address, summary, and other specified attributes. To handle user queries you define a function called handle user query. Handle user query function would take into user query, the database object, and the collection object as its argument. And now you get to use the vector search function. Call the vector search function, pass in the query the database object and the collection object, and assigning the results to a variable called Get Knowledge. Get knowledge will hold the list of the documents retrieved through the vector search operation. You conduct a sanity check to ensure get knowledge is not empty. Once you've obtained the search results from vector search operation held in the get knowledge variable, you will convert them and ensure they meet the specified model defined in the search result item. The next step is to convert the search results into data frames. This allows for efficient modification of the search results. In this step you will pause the query and the search results to the LLM. In this course, you were using GPT-3.5 turbo as the LLM for the RAG system. Here you are specifying to the system that it's an Airbnb listing recommendation system and passing in the query along with additional context held in the search results. Data frame variable. The following step extracts the response from the LLM. You extract the response from the LLM and assign it to a variable called system response. Next, you will print out the user query and the system response for visualization and observation into the process occurring. The final step here is to display the search result as a table, which holds the additional context passed into the LLM as input. The handle user query function will return the system response. This is the final step for this lesson where you assign a string, representing a query to a variable called query, and pass the query into the handle user function along with the database and collection object. The query you are using for this lesson and course is specifically one that indicates to the system to recommend an Airbnb listing that is warm and friendly and not too far from restaurants. Now, you run to handle user query. Here you can see that the vector search operation took 0.02 milliseconds, which is very fast. From the print statement, you can identify the query, passing into the vector search operation that was then embedded, and a vector search was conducted. The system response can also be observed. It's recommended the coziness heart of Plateau in Canada, and it's provided a reason why. Pause the video here to observe the reason. In this lesson, you learned how to load your data into a development environment, model your data using Pydantic conduct data ingestion into a connected MongoDB database and perform a vector search operation. You essentially built a RAG pipeline. Next lesson, you will explore adding filtering to your vector search operation, including pre and post-filtering. See you in the next lesson!