Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this lesson, you will have everything you need to assemble a fully stateful Memory Aware Agent. You will implement a startup routine that loads prior context from long-term memory, build a recursive reason loop and intermediate memory checkpoints and creates an agent that persists across sessions and improve over time. Let's have some fun. In this lesson we're going to touch on memory-aware agents. We're going to cover, specifically, what the agent loop is. We'll look at memory operations that are in and outside of the agent loop. We'll also get an overview of agent harness And generally, we're going to implement a memory-aware agent. The Agent Loop is this cyclical iterative environment in which an LLM executes for a specific amount of time. it will start with assembling the context, which we've been talking about throughout this course. Then we will pass the context that we've assembled into an LLM to enable the LLM to reason and decide on the information it's received. the LLM would then take an action. The action could be to respond, to call tools, or to ask the user for further input. But generally, this is what we refer to as the agent loop. where we assemble context, invoke the LLM and act on the response from the LLM. It's cyclical and iterative, which means this loop can be executed for a specific amount of time. termination of a loop is usually determined by a stop condition. This could be a final answer from the LLM when the goal is completed or when we have an error or timeout on the loop execution. The Agent Loop starts when the user provides context or input into the Agent Loop and begins the assembly of context process. Now what you see on the screen is a full Agent Loop with a START CONDITION, the Agent Loop itself, and a STOP CONDITION. To put it more definitively, the agent loop is a cyclical environment that executes for a specific amount of time where an LLM is involved and a response is observed and an action can be taken. It assembles, it invokes, and it acts until a stop condition is met. In pseudo code, the agent loop will look as shown on screen, where we have a MAX_STEPS specified, and we have an LLM that receives context, and we have an action that can be taken and then a final result that is extracted. Now we can go into memory operations in and out of the agent loop. Now that you understand what an agent loop is. Let's start with memory operations out of the agent loop. In previous lesson, we had an overview of different forms of agent memory. And we also had corresponding memory stores which are tables held in a database. These tables had operations held within a memory manager that exposed read and write operations that we can call upon. These operations are shown and utilized within the Agent Loop, specifically outside of the Agent Loop. We introduce a term harness because memory operations and operations in and out of the Agent Loop makes up the agent harness, which are programmatic scaffolding that enables the reliable execution of an AI agent. At a start condition, when we are about to enter the Agent Loop, we typically would have memory operations that read from memory. This is because we're assembling context for our LLMs. But we can also have memory operations that live and are executed within the agent loop itself. And this is because our agents can call on tools and we also have tools that are executed programmatically outside of the agent's control. Examples of these tools that can be executed by the agent within the Agent Loop are to search the internet, expand on a summary, or even summarize conversation. But summarize_conversation is an operation that can be executed within the Agent Loop programmatically without the agent having control over it. especially when we set a condition such as when the context window exceeds 80% of its capacity. On the screen, you can see different operations that exist outside of the Agent Loop and inside of the Agent Loop. And as covered before, this makes up part of your harness that makes a reliable memory-aware AI agent. In this lesson, we're going to put together everything we've been building up and build a Memory Aware Agent. We will start off by setting up our database. And then initializing a connection to the database represented by the database_connection variable. When you run the cell, you'll see a successful database setup as you've seen before. Next, we'll set up our OpenAI client and our embedding model as we've done before. Now, we'll set up our memory store, StoreManager, and MemoryManager. We'll start by specifying the names for all the memory tables as we've done before. Next, we'll create the conversational table and the tool log table just as we've done in previous lessons. Following on, we'll initialize our StoreManager by creating an instance of the StoreManager within this lesson. We would then also create data objects representing our memory tables from our databases. Finally, we will initialize our MemoryManager. Additionally, we will load up our toolbox and register some of the common tools we're going to use, such as the arXiv search and the tool that allows us to search for papers and save it into the database. When you execute the cell, you should see numbers of summary tools and common tools registered. These are tools that are available for your AI agent that is memory aware, and this will be agent triggered tools. In this part, we're going to load up all our context engineering techniques we explored in the last lesson. We've moved them into the helper file. Now that we have all the tools and the context engineering techniques loaded in this development environment for this lesson, we are going to start specifying our agent system prompt. This is the system prompt or system instruction that'll be passed into the LLM using the system role, which has a higher weight when the LLM looks at instruction provided within the system instruction. As you can see, this becomes even more memory aware as we tell the LLM the type of memory it has, how the context window is partitioned, and mention the different types of memory that we've created access to. This makes the LLM memory aware. Next, we have the function execute_tool, which allows us to call resources in the form of Python functions that exist within our development environment, which will allow us to call tools and execute functions by passing in the arguments provided by the LLM during tool call. And lastly, we have call_openai_chat, which just sends the context to an OpenAI model. Now, we can actually go into the agent loop. In the main agent loop, we're going to start by actually defining a function called call_agent that will take in the user query, which will be the start condition for actually triggering the agent loop. We'll specify a thread_id, as we want this to be session scoped, but also the max_iterations we want the agent loop to iterate for. In this slide, we've identified memory operations that occur outside of the agent loop and inside of the agent loop. We will start with memory operations that occur outside of the agent loop. And here, we can see memory operation that operates outside of the agent loop in where they're used to build the initial context. The steps we'll take are to build the conversational memory, the knowledge base, the workflow, entity, and summary memory. The next step is to have a deterministic memory operation executed. and that is to check if the context usage is exceeded or is equal to 80%. the usage of the context window is greater than 80%, we will trigger a summarization programmatically. Finally, we will have our entire context assigned to a variable context. The next step is to retrieve tools that are relevant for the query. Again, this read operation is a memory operation that is outside of the agent loop. This reduces the number of tools that we need to provide to the agent, as we will only require tools that are specific to the specific query being passed into the agent loop. Next, we'll write the conversational memory, the query, the role, and the thread_id. Now, before we get into the agent loop, the last step we do is create our message or the context we'll pass into the LLM within the agent loop. And this context is going to contain the AGENT_SYSTEM_PROMPT, which makes the LLM even more memory-aware. and the role and the context that we have to start with. In the agent loop, we use a call to the call_openai_chat function we explored earlier. We'll pass in the messages and the tools that are available for the agents to call upon. We will assign the response generated by the LLM to a variable called response. by analyzing and looking into the message within the response, we can decide if we're going to make a tool call operation or we're just going to move to a final response. Remember, in the agent loop, the agent can either decide to act or respond to the user, meeting a stop condition. If the agent decides to make a tool call, we will append this into the messages. Within the agent loop, we're able to execute tools and pass in the argument and the name that the agent has specified for execution. Finally, now within the agent loop, one of the memory operation we're going to conduct is to move some of the tool logs outside of the agent loop and into the database. This is context offloading. Once the tool result is received, the iteration of the agent loop will be incremented, and then we'll provide the LLM with the tool result. If the LLM decides to break, and actually go into a final answer, we would display this and assign it to the final_answer variable. There is also a situation where we step out of the agent loop, and that is when the max iteration has been reached without a final answer. and then we will give the final answer a pre-templated response in which we specify that the agent was unable to complete the request. Outside of the agent loop, we still have some memory operation. such as write_workflow and write_entity and write_conversational_memory. These are to capture some of the artifacts from the agent's execution, such as the number of steps that were taken to execute a query or achieve an objective, and we can store that as workflow memory and also the outcome of this specific step. We'll also write any entity that have been detected with the final_answer. And finally, we'll write_conversational_memory the final answer and the role of the assistant. This is an overview of the agent loop and the operations that happens in and out of the agent loop. Specifically, we have a memory-aware agent that we can call on and it can execute and perform operation with memory and continuity. Let's see this in action. To see our agent loop in action, we're just going to call the function call_agent and pass in a query, can you get me the paper on MemGPT with a unique thread ID. We're going to go for the output. As you can see, it's building the context and we can see the context window utilization. The LLM is aware of the question being asked, which is can you get me the paper MemGPT. And what we can see is different allocation of the context window to specific memory types. Conversation Memory, Knowledge Base Memory, Workflow Memory. Entity Memory and so on. For Conversation Memory, we can identify its conversation memory by this identifier using a markdown subheading. We're using markdown headings to enable the LLM to understand a hierarchical structure of information within a CONTEXT WINDOW. LLMs are able to understand this because this information or similar structured information are part of the training data. So they have a latent ability to understand structure and hierarchical information in markdown files. So within conversational memory, we specify how the agent should use this memory and when it should be leveraging it. This is making our LLM memory aware. We do this for every memory included in the context window, from knowledge base to entity memory and so on. For this specific run, because it's the first iteration of the agent loop, there are no retrieved messages. But because we've ingested some papers into our research agent earlier on, it will have knowledge of research papers. And this can be seen in its Knowledge Base Memory. The papers that are retrieved are ones that are semantically similar to the query passed in. In the first iteration of the AGENT LOOP, because the agent doesn't have any of the papers required to answer the question within its memory, it's using the tool arxiv_search_candidates. and is passing in a specific query, MemGPT, arXiv papers, and it's gotten a few response back. In the second iteration of the agent loop, the response from the tool call is returned to the agent. or to the LLM, and then it provides the final ANSWER where it can then exit the AGENT LOOP. So in the second iteration, we see our ANSWER. It provides us with the MemGPT paper and an overview of information allocated to it. And also specifies the next actions it recommends to take. To showcase the continuity of our agent, we will continue the conversation by calling the call_agent function again and ask it, can you save the content of the paper? This is to showcase the Conversational Memory because the agent can carry on the conversation and has an awareness of the paper we're referring to, which should be MemGPT. As we can see, the CONTEXT WINDOW is being built. And in Conversation Memory, we can see the previous answer or previous interaction that we had. The first question we provided alongside the response. We've asked the agent to save the paper, and we expect it to conduct tool calls to do this. But first, we can see that the Workflow Memory is being filled with some information from the previous step. which you can see with the query was can you get me the paper MemGPT and the steps taken has been recorded with the outcomes. In this step, the agent has access to tools and is going to use the tool fetch and save paper to knowledge base that we specified earlier. is able to get the right paper from the previous conversation because it has conversational memory. and in the second iteration of this, it provides an answer where it says it saved the paper for us. To showcase further continuity, we will ask the agent to pull out some main takeaways from the paper. Again, we don't have to specify the paper we're referring to because all information is within the conversational memory. Now let's see the response from the agent in terms of getting a response to the main takeaways from the paper. One thing to note is the response of the agent is being conducted with one iteration of the AGENT LOOP, which means we don't have to go look for the paper again, and we could just look into the conversational history and other memory segmentation of the CONTEXT WINDOW to actually provide an answer. And the answer is provided in a structured format. Here you can see we have a memory-aware agent that is able to perform different tasks and is aware of conversational memory and can utilize other memory partitions such as the workflow memory to store previous steps. To see summary memory in action, we're going to ask the agent to summarize the conversation using its tool. The ability to summarize conversation is both a deterministic and an agent-triggered memory operation. So, that means that the agent at its discretion could summarize the conversation if the context window is getting to a capacity it feels it needs for it to summarize. by calling the summarization context tool, which will see an entry into our summary memory. But first, we will see the tool being used, then in another iteration of the call_agent, we'll be able to see that the summary memory has a placeholder in it. and the Conversation Memory has been reduced. Looking at the tool calls, we can see that the agent has summarized the memory by utilizing the summarize_and_store tool. As we can see, it's marked seven messages as summarized and also created a Summary ID in our summary memory store. In the next iteration, we should be able to see that a conversational history is less and we have an entry in our summary memory where right now there are no available summaries. Let's observe this. We're going to ask the agent what our first question was, which should not exist in the conversational memory as we have summarized the conversation. As we can see, the retrieve messages of the Conversation Memory only contains a response from the assistant that says it stored a summary. But the previous interactions did not exist. In the summary memory segment of the context window, we can see that there is a placeholder for a summary ID and a description of what the summary contains, which was a MemGPT paper that was stored in SEMANTIC_MEMORY and a key insight summarized. Another thing to point out is Entity Memory here in action. In Entity Memory, we're trying to capture entity information such as people, places and organization, and you can see from our interaction, we have captured those in several different instances, such as the author of the paper. and other information. But remember, we asked the agent what our first question was. But in order to answer what our first question was, it would have to unpack a summary. And we can see it does this as within the iteration of the AGENT LOOP, it expands a summary of the ID that it has in its summary memory. And now it has information as to the actual summary or the uncompressed form of the summary that was generated. And finally, it can answer the question that our first question was, can you get me the paper MemGPT and the time it was sent. You've seen summary memory in action, context offloading and context retrieval, all implemented with memory engineering techniques within an agent loop and having a memory-aware agent. You've come to an end of lesson 5. In this lesson, we've built a memory-aware agent. Well done.