Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial – cancel anytime
In this lesson, we'll start by defining what an AI agent actually is. Then look at what happens when you interact with a stateless agent versus one augmented with memory. You will see why conversational history alone isn't enough and what agent memory really means. We'll wrap up by introducing the agent stack and a memory core, the architectural foundation that the rest of this course builds on. Let's dive in. In this lesson, we're going to cover why AI agents need memory. More specifically, we're going to get an overview of what an AI agent is, what we mean by stateless agent and memory augmented agent, and we're also going to get an overview of conversational history and going beyond conversational history. We'll finalize this lesson by looking at two key concepts, which are Agent Memory and what we mean by the Agent Memory Core. What is an AI Agent? An AI agent is a computational entity that can perceive its environment through inputs, take action through tool use, and also has reasoning capabilities enabled by an LLM. And more importantly, an AI agent has some form of augmented memory, typically to allow it to store, retrieve, and apply knowledge across sessions and interactions. This is what we define as an AI agent. Generally, AI agents should be able to operate independently, act on behalf of the human with almost little to no feedback. Ideally, they are goal and objective bound. Now that we have an understanding of what an AI agent is, it's time for us to get a very clear understanding of the importance of memory for AI agents. We'll do this by looking at how an AI agent would operate without memory. Typically you would have an AI agent embedded within an application. And in this application, a user would interact with the agent. Typically, we can imagine a user asking for recommendations for restaurants around the area. We would refer to this as turn 1. This message would then be sent to the agent. And the agent would generate a response, ideally detecting the intent of the user, reasoning about the message the user sent, and using a bunch of tools to actually achieve the objective, such as searching for location and getting restaurants around a location, before finally providing an output to the user. We can refer to this as interaction turns. Following subsequent turns, we would have the user interact again with the same agent to continue the conversation. We can refer to this as turn 3. When the user gets the recommendations from the agent, the user could possibly ask to book the first recommendation on the list. But an agent without memory would actually respond as shown on the screen, which would be it has no recollection of the conversation the user is referring to. and would ask the user to please specify. We could refer to this as turn 4. In this event, we can see that due to lack of memory, the agent is unable to complete the task that the user has specified. We can refer to this as a multi-turn interaction. But obviously, this is without memory. Now we've seen what an AI agent with just Perception, Action, and Reasoning characteristics would operate like in a real-life scenario. Essentially, we would refer to this type of AI agent as a stateless agent. Let's talk more about what a stateless agent is. A stateless agent is one that can actually still perceive its environment through input and reason over this input and produce outputs back to the user, essentially powered by an LLM. And this feedback is very crucial. But one thing is, with no memory, as we've seen, the AI agent doesn't retain or recall information beyond a single turn. There are significant disadvantages to this. For example, the agent will not be able to complete a long-horizon task. Now, these are tasks where the agent is expected to run for several minutes to hours or even days. But without having information of previous interaction or steps taken, it would be very difficult to complete the task. The agent will also have no context awareness across sessions, meaning if a user was to interact with an agent and then leave and come back or start a new session, the information as to the user would be lost and not transferred across sessions. More importantly, the agent would lack adaptation capabilities, which means any new information provided to the agent during an interaction would not be updated or used in subsequent interactions. Stateless agent can actually have high operational cost because to keep or at least augment continuity, you would have to stuff a lot of information into the context window at every single turn. There are more disadvantages, but these are the significant ones. Now, we've had an overview of stateless agent, which is an agent without memory. But we can move on and see what an AI agent with memory would look like. We're going to use a similar scenario as to the previous one. But let's start from turn 3. Turn 1 and 2 in a Memory-Augmented Agent would have been stored in an external memory source such as a database. So when we start with the user interacting with the application and also the agent indirectly, we would have turn one and two within the memory of the agent. What this means is, when the user asks for some recommendations and also asks them to book the recommendation on the first on the list. The agent will respond appropriately, which is identify the restaurant on the first on the list and actually provide a reasonable output, which would be asking the user what date and time would the booking like to be made for. We can refer to both turn 1 and 2 and turn 3 and 4 as interaction history. And all of this will be stored in a database for all subsequent turns. Essentially, we have a memory-augmented agent. Let's take a further deep dive into Memory-Augmented Agents and their advantages. Just like stateless agents, they can Perceive Inputs and also Reason Over Inputs and Produce Outputs. All of this will be powered by the LLM reasoning capabilities, but a more important thing here is we have a database where information is stored and retrieved from. There are many benefits to this, such as the agent's being able to complete long-horizon tasks, has subsequent context awareness and can actually adapt. Let's go over the advantages in more detail. One of the key advantages is the ability for memory-augmented agents to actually complete long-horizon tasks, mainly because they can reference previous interaction and context held in previous sessions. Now this leads to a Sustained Context Awareness, which feels to the user that there is a continuous interaction had with the agent. Now, what we have ideally is an improved sense of efficiency and also reduce operational cost. This is because we reduce the amount of information we have to pass into the context window and only pass in one that are relevant to the interaction that have been stored in an external memory store. Finally, we'll have greater reliability in multi-step workflow. This is mainly due to the fact that we can reference previous steps that were taken and also reference previous context, which makes all the subsequent steps more reliable in terms of successful outcomes. We're going to be exploring more of all of these advantages in future lessons. Memory augmented agents come in all shape and sizes. What we've seen might be a naive form of memory augmented agent. And here's what we mean. Going from a stateless agent to one that can remember means storing interaction history within an external store. The key benefits of this we've touched on, but the most important one is the continuation of interaction. Essentially, we're bringing continuity to an agent through the storing of conversational history or interactions. And these interactions are typically between the user or users or the agents or assistants. Interaction history, when it's stored in an external store, can be referred to as Conversational Memory. We're going to be looking more into different forms of memory, but conversational memory is one of the simplest forms to understand one that interaction history lends itself to. An interaction is typically back and forth information between a user and an agent. In conversational memory, we have very specific attributes. We have a timestamp, which is when the interaction was taken. We have the user and the assistant message, and this will be stored in a database, in external memory. Ideally, conversational memory will be time-ordered, which means that when we return any conversational memory data, we are retrieving it ordered by time, so we can see the sequence of actions and interactions taken. Let's see how this would look like in the context window of an LLM. So an LLM would have a context window that can take a certain amount of tokens. In the context window, we would place in the system prompt and instructions. Now, interaction history will be retrieved as conversational memory. where we'll put all the multi-turn past interaction from the external store. And then we'll place in the final user prompt. This is a depiction of what the context window of an LLM with conversational memory would look like. But we actually need to go beyond this. And there are several reasons as to why we need to move beyond conversational memory or just using interaction history to create memory augmented agents. The first one is very simple. Conversation windows are actually finite, but the user relationships are not. What we mean is that we can actually capture more relationships between users and the assistant by looking at conversations or other data associated with conversations. For example, information as to entities that are mentioned during an interaction. entities such as place, people, and relationships to people. Not all valuable information are actually stored in a single conversation. We're going to have to move beyond conversational memory to actually extract useful information that we can use in cross-session interaction. Agents need structured, queryable knowledge, not just chat logs. Data stored in conversation memory is just interaction history. But agents can operate within workflow where the steps taken in the workflow is actually useful information and also the outcomes. Now, this is not conversation history or interaction history. Therefore, we need to expand and go beyond conversational memory, which we will see in future lessons. Now that we've seen conversational memory, it's time to explore the distinct forms of agent memory you will be coming across. The easy way to view agent memory or in two distinct form, which are short-term and long-term. Let's look into short-term memory. Two common forms of short-term memory are Semantic Cache and Working Memory. Semantic Cache is essentially using a caching mechanisms that leverage vector search and previously received response from an inference provider to essentially use as response for similar queries in subsequent interaction. While Working Memory can be seen as the LLM Context Window and any Session Based memory. This is essentially seen as a scratch pad where the LLM can operate within but is lost after an interaction or session. These are all the different types of short-term memory. For long-term memory, we have three main forms, such as Procedural, Semantic, and Episodic. Let's look at examples of these types of memory. For Procedural memory, a common memory type we're going to be using in agents is Workflow memory. This is where we store the steps and interaction that an agent has taken to achieve its subsequent objective. And these steps could include the calling of tools and other processes. It's ideal to record the steps in a form of external memory, which can be referenced and pulled in in subsequent interaction as a form of experience for the agent to refer to. We call this Workflow memory. A good example of Semantic memory is simply the Knowledge Base, which is any external knowledge that the agent needs to know to complete a task. This could be domain-specific knowledge in regards to the domain that the agent operates within. Then conversation memory that we explored previously is known as a type of Episodic memory. typically because we have the timestamp attribute on the information captured and we could reference this specific data using time. Now we have a good understanding of different types of agent memory. It's time to understand what specifically is agent memory. Agent memory can be defined as the composition of system component with some architectural components as well, in order to enable an agent to adapt and learn. The system components that you'll find in Agent Memory are typically Embedding Models, a database, and a large language model. Ideally, the combination of these system components, along with some control mechanisms and software harness, which is code, will enable an agent to store, retrieve, and recall information that allows it to adapt to interaction and learn. To understand agent memory, we can use your previous knowledge on retrieval augmented generation, or RAG. We're going to see how RAG works very quickly and then bring that in and connect it to agent memory. Let's go over a typical RAG pipeline. First you would have a data source which you will pass through a Data Processing pipeline, which will include the breaking down of the data object into subsequent chunks. These chunks are then passed into an embedding model, where a numerical representation of the data object is created. This numerical representation captures the semantics and context of the data object that was passed into the embedding model. Alongside with the Embedding Model and other metadata, we can pass in all different data types into a single database for storage. Now, we would have a user which will interact with an application by passing in a User Query, which would then be vectorized by passing in the User Query into the Embedding Model. and generating a numerical representation of the User Query. Typically, the Embedding Model used to generate the embedding for the User Query would be the same we use during the ingestion pipeline. Once we retrieve the semantically similar rows to the user query, we would pass these rows into a Reranking Model. The results of the reranked rows are then concatenated with the user query and passed into the LLM to ground the response of the LLM in domain-specific data. Now, this is a typical RAG pipeline. In order to make a connection to how RAG relates to agent memory, we will be taking the following steps. We would have the typical ingestion process in the earlier portion of a RAG pipeline. And we would also bring in our knowledge of an AI agent and the main characteristics which are the Perception, Memory, Action, and Reasoning capabilities that we covered previously. To make the connection of RAG to agent memory, what you would have is the abstractions or the memory types that we covered in previous slides as computational form within the database. Ideally, this would be tables within a database. So you'd have your semantic memory, your procedural memory, or episodic memory, which can be conversational memory, as tables represented in a database. A memory manager could then be used to abstract away the methods and program used to read, write, update and delete data from these tables. And our agent would have access to all of these capabilities by connecting a Memory Manager through tools and providing these as capabilities to the agent via memory. And this is how we bring in our previous knowledge of RAG and also the characteristics of an agent and different forms of memory into agent memory. Now, we're going to cover the term Agent Memory Core. In an agentic system, there are three main components where memory is located or could be said to be located. This would be your large language model, which has parametric memory of all the data it's been trained with. and also your Embedding Model, which has memory by which it could draw semantic and context information from when generating an embedding. But also you have your database. This is where you're going to see the most traffic of data within your agentic system. This is where data is stored, retrieved, and optimized as well. These are all the main system components associated with agent memory. But an agent memory core would be your database. This is where the most information and data would be stored and retrieved from in the entirety of your agentic system. Most of the data traffic we see in AI agents flow through an external memory which is backed by a database. An Agent Memory Core can be defined as this primary infrastructure that will see the most data traffic within your agentic system. It should be able to handle the storage of information, the retrieval and the optimization of information within the store itself. This is why we refer to the database as the Agent Memory Core. This is an important concept to take along with you through this course.