Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this lesson, you'll tackle a real scaling challenge. What happens when your agent has access to hundreds of tools? Instead of stuffing every tool definition into the prompt, You'll treat tools as procedural memory, storing them in a memory backed store and retrieving only the relevant ones at inference time using semantic search. All right, let's go. What we're going to cover on this lesson is how large language models are able to become tool-aware for task completion, how they are able to perceive the tools that they have at their disposal. We'll also talk about the Toolbox pattern, which is a pattern that allows large language models to become aware of all the tools that they have and how the tools are treated as retrievable sources rather than putting all of them into the context. And how we can use embeddings and semantic search to allow these large language models to select their most optimal tools at any time. Tool calling is a technique or a pattern where the large language model doesn't directly execute the code. But rather, it will output a structured request so that the environment will actually execute the code and then return it so the large language model has this information available to return the information to the user. In a typical scenario, what you will have is a lot of tool definitions with a name, a description, and some parameters on how this tool has to be called. And then we'll also have a user query, like for instance, What is the weather like in London tomorrow? The large language model will be able to output this structured request which will execute within the environment. Then the system will execute the tool, return the response to the large language model, and then through this response we will be able to generate the correct response to the user. But this has some limitations. We typically have a lot of tools that we want to use on our environment. And this is of course very beneficial for the large language model because the more information and the more tools that a large language model has at their disposal, the better the responses will be, right? But whenever you have too many tools, this has some obvious disadvantages, right? The context size is limited on large language models, so we obviously cannot stuff all of the tools, their names, their parameters, and their descriptions into the context without having some problems. So putting all of them into the context window can cause a lot of agents to fail in a number of ways. These are some of the most obvious disadvantages of this. First of all, we have Context Confusion and Context Bloat, which is that whenever a large language model sees a lot of these tools at the same time, the context will get overwhelmed by the amount of information from the tools and will have less context available for the input and the actual output being produced. We also have this thing called Tool Selection Degradation, which means that the responses from the large language model will degrade as a consequence. This will also cause the latency and the tokens to increase. So you will suffer the consequences of having to pay more tokens. and also the latency can increase because the large language model will actually take longer to produce a response for all the amount of information it needs to process beforehand, which in general will cause a performance degradation. So, a way for us to scale this system so that it's more beneficial for the large language model is this. First, we can have all the names, descriptions, and parameters for each one of the tools. So if we encode that information and put that into a vector space by creating an encoding of this information and passing it through an Embedding Model, we can create an embedding of the Name, the Description, and the Parameters of the tools. And then we can use this Embedding as well as the original information from the tool and load this into the database. So we can perform semantic search operations and similarity search on it. so that the large language model, after it receives a user query, for instance, our original query, what is the weather like in London, we can take that input and we will perform similarity search against the embedding that we just created. So after we have our embeddings, we can take the user query and perform similarity search against this embedding. So semantic search can be executed and the best tools for resolving this user query will be selected. Then after the right tool is selected, the large language model will be able to call this tool, execute it, and return a result to the large language model. So, as you see on the bottom left, this is where a representation of our embedding space. We will have all these tools on this space. And then, by performing similarity search, we will find the closest tools to the user query. Now, this is a great idea, but it might not be enough. So we propose another type of enhancement, which we call the Memory Unit Augmentation, which is a little bit of a different strategy where we will take the tool definitions, the name, and the description, and run them through a large language model and enhance this name and description. And then we will use this enhanced name and description, encode it, pass it through an embedding model, and then create an embedding out of that. And after this, we will have an augmented tool definition. which will be loaded into the database. So we will basically let the large language model enhance our tool definitions. And this has some obvious advantages. For instance, this will make it so that in the embedding space, when we perform semantic search now, there will be higher and better separability between the tools. Also, there will be higher recall and a high-signal embedding text, which means that all the things that we have put into this embedding will have better characteristics and they will be more discernible by the model. We are going to dive deep into agent tool use with semantic tool memory. So, as we did before, we need to quickly connect again into our Oracle database to get access into the system. And as we'll see, we are already connected again into our database and the connection seems to be working fine. Then we're going to reload our embedding model, which is the same that we were using before, the paraphrase-mpnet-base-v2 to create our embeddings. And now that we're going to start using our large language models, we're going to instantiate our OpenAI client so we can interact with the large language model programmatically. Then, as we did before, we're going to set up our conversational knowledge, workflow, and all the tables that we were using before for the store manager. And we're also going to provision our tool log history and our conversation history tables to be able to access them as well. And since we were already running it, you will see that they already exist, but if you hadn't, you will get a notification that they were properly created on this step. And here we will also be instantiating our StoreManager as we did before. And after the creation of the StoreManager instance, as we did before, we will get all the objects for the individual stores via the manager functions that we created. And after we run this, you'll see that all the stores are properly loaded via the StoreManager. And now we begin with the actual code for our chapter. What we want to do first is instantiate the MemoryManager as we did before, and then we're going to initialize our Toolbox pattern object. The purpose of the Toolbox is to actually load and store all the different tools that the LLM is going to have access to. So we need to pass the memory_manager, the OpenAI client instance that we created, so that it has access to do some augmentations which we will see, and the embedding_model so that the Toolbox is also able to perform vector similarity search on the toolbox object. So now that we have executed and correctly initialized our toolbox object, we can now see that the memory manager and the toolbox have been correctly initialized. So what you are going to do now is to register a new tool into the toolbox pattern. so that the large language model is able to confidently select it when it needs to. So what we are doing here is to register a new tool with a type of parameter that lets us augment this via the LLM enhancement that we saw on this chapter. So we prepared this function called read_toolbox. which takes an input query as the user's input, and also a number which can be any number. And this value represents however many tools you want the similarity search to return when querying the toolbox. So after you run this, you will be able to see that the registration of a tool has been properly executed. And this performs similarity search on our toolbox table in the database. So after we have defined the function, we will be able to use the toolbox registration tool function to register more tools and also read from the toolbox. So after we have created the toolbox pattern and have a function ready to register tools and also read from the toolbox, what we are going to do is create a new tool, which is through Tavily. Tavily is a service that allows us to search on the web and store these results into the knowledge base on our toolbox. So for that, we first need to instantiate the TavilyClient through their library, and then we will use our register_tool hint to create this new tool so that every time we want to use Tavily, we will be redirected to this function called search_tavily which, as before, will return a maximum number of results by default five. and we'll perform this query against the Tavily client through their API. In here, what we are doing is searching with the Tavily client on the query that we want. and limiting the results to our max_results. After we get our results object, we will write this result into our knowledge base. For that, we'll create some text content to embed in which we will get the title, the content, and the URL and create some additional metadata associated to it. For instance, the title, the url from which we extracted the information, a score representing the verifiability of the information, the query, and the timestamp from our query. and we put all this information into the knowledge base. And in this cell, what you are going to do is see the live difference between actually using the original docstring, which is what the actual creator of the function used versus actually allowing the tool definition and the name to be enhanced by the large language model. So, for that, we're going to take the tool by name, which is a tool that we just registered before. We're going to take its source and we're going to compare the original docstring with the augmented docstring generated by the OpenAI model. And as you will see here, rather than having the original docstring, you will get an augmented and LLM-enhanced docstring, which not only takes the description, it runs and creates a step-by-step call a definition of each one of the parameters and the return values for the function. So, this will allow the large language model to store this definition into the database. and the results will be much more separable, which will allow our similarity search functionality to make these functions much more distinguishable for the large language model to use. So what you are going to do here is create a local tool, which will use local Python code, like using the datetime library locally, rather than going through an external API third-party service like Tavily. Each tool can be implemented either externally or locally. If you run it locally like this, you can create as many tools as you want. And if you use a third-party service, you'll have to integrate with whichever services are available. And on this cell, we're going to take a look at the ArxivRetriever, which is an automatic integration by the langchain_community package, which allows us to retrieve arXiv papers with a maximum number of papers. as well as a maximum number of characters per document. So in this case, what we will do is use the ArxivRetriever object from the langchain_community to get some of these papers. So we will set up an ArxivRetriever object which will allow us to retrieve arXiv papers in the future. In this cell, we're going to register an arXiv discovery tool. So, for instance, we are going to create a function first of all to process the arXiv URLs and actually get the paper URI, the identifier of the paper. And then, in the search candidates function, we're going to search through arXiv and return a list of JSON candidates with their IDs and metadata so that we can then use this information to augment the large language model with external third-party information from arXiv. this case, we are getting a list of documents and for each one of the documents, we are appending and getting the metadata, the entry ID, which is the identifier of the paper, and a list of possible candidates which are the closest to the actual user's query. And now we're going to see how to perform deep ingestion, which not only allows us to use the previous ArxivLoader from the previous cell, but also normalize all the information and actually insert this information through loading the full paper information, the metadata that we extracted from the previous cell, chunking this information through the RecursiveCharacterTextSplitter, which is just a way to perform chunking with a little bit of overlap to improve each one of the chunks quality and reduce confusion on the context. And after storing these chunks into our vector store from Oracle, we will be able to write this information into our memory manager to increment the information and the vectors made available by the toolbox. And now for the moment of truth. We are going to test the toolbox functionalities that we integrated in the chapter. So we are going to ask the memory_manager to read the toolbox object that we have in our database and we are going to ask it to get us more details on AI papers. And for this, we are going to limit our responses to a maximum of one. So the best available toolbox to retrieve the information for this user query will be printed. And as we can see here, the best tool by performing similarity search against the toolbox table on our database is the fetch_and_save_paper_to_kb_db, which is the name that we associated to get information from the arXiv. Feel free to customize this and test with different values of K, so you will see the best tools available for different queries.