DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

DLAI Logo
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
This lesson will dive deeper into how core memory is designed and implemented. We'll also go through an example of how you can customize core memory with custom memory blocks and tools. Let's get to it. Core memory is defined by memory blocks, which are basically subsections of memory and memory tools. The core memory within the context window is actually divided into multiple blocks, where each block corresponds to some kind of character limit, which defines how much of the context window can be used up by that block. The block also has a label such as human or persona, which can be used to reference the block. And finally, of course, the block has a value, so this is the actual data that's placed into the context window. So for example "My name is Sarah." In addition to these blocks, core memory also has tools associated with that memory. So for example you might have core memory replace where specifies the block label in this case human, as well as the old content Sarah, which is getting replaced by the new content, Bob. Block data is compiled in the context window at inference time to make up the core memory context. So for example, for the human block, you might have the tag that shows the section human. And then this tag will also include the number of characters out of the total budget that's being used so far, in addition to the value. Memory blocks are synced to a database and have unique IDs, so they can actually be shared across multiple agents by syncing the block value to multiple agents context window. So to start, let's first once again initialize our Letta client and connect it to our locally running server. We're also going to once again paste in this custom helper function for printing out Letta messages. So we actually modified it this time because this lesson we're actually going to use streaming responses since we'll have more complex agent interactions which will be many steps long. So we basically want the new step stream back to us as opposed to waiting for them to all complete. So in the streaming scenario, the user should his six actually also get returned as a special message type that has a message type usage statistics. So we'll also print these out and then also have a big line between different steps so that we can kind of read them more easily. So, like we went over last time, core memory consists of multiple different memory blocks, and each block represents a section of the LLMs context window that's reserved to storing memories related to that block. We can create an agent that has a human and persona block, and then also list out the blocks that are created by this agent. So we can do this by calling the client's block list function and specifying the agent ID. So we can see that this returns list of block objects that have the value of the human block, and then also the value of the persona block. And we can also see things like the character limit that's assigned to this block. The agent is not allowed to make this memory block longer than this number of characters. So we can make sure that the context window doesn't overflow or become overly focused on just the in-context memory blocks. And then, like we mentioned in the last lesson, there's also these unique IDs which we can use to access the blocks directly from the client. So let's go for an example of directly accessing the blocks. We can store this block ID by copy-pasting one of these block ID strings. And now that we have this block ID saved, we can actually also query the client, with this block ID to get back this block as well. So if you want to modify this block or get this block, you can do that by passing the block ID. You can also access block by specifying a combination of the agent ID and then also the block label. So the block label will always be unique for specific agents. So you can't have multiple block labels that have the label human. Blocks are represented inside of the context window, and a special section of the context window dedicated to core memory. So we can actually also see, for a specific agent, what the template is for how the blocks are shown. So this is actually the prompt template for how the blocks are represented. We can see that it has a tag that has the block label. And then it also communicates, through this, how much of the total length is being used of the limit. And then of course, inside of the tag, it also contains the block value. So you can actually customize this if you want to change the way that the blocks are rendered instead of the context window. So like we went over in the last lesson, Letta agents have access to tools like core memory replace and core memory append to actually edit their in-context memory. And the way that this implementation actually works is that we're able to define tools that we attach to the agent, that are able to modify the agents internal state or blocks. Before we get into how to customize in context memory management, we're first going to quickly go over, how you can define stateful tools, inside of Letta, are tools that are able to modify the actual agent state through the tool. So, a really simple example of this is this tool here, which is getting the agent ID. So we pass in a special argument into the tool definition. So the field that has the type agent state. And we'll put this in quotes just so that we don't get import errors. All this tool does is get the idea of the agent by accessing the agent state, So we can create this tool by calling upsert_from_function. You can also do create from function. There's a couple different ways to create tools with Letta but this is perhaps the simplest. You'll just need to make sure that your docstring contains all the fields, except you don't need to specify any documentation for agent state, since the Letta system will know to inject the agent state of the agent when this tool is called. So we'll create this get id tool. So now that we've created this tool, we can create an agent that's actually using the tool. So for this agent, we'll make it really simple. It'll have no in-context memory. And we'll basically attach an extra tool, this get ID tool, by passing in the ID into tool IDs. And then we can basically message this agent to ask you what is your agent ID. One thing to note is that we're actually now calling create stream instead of just create here. This basically is telling Letta we want to get back the responses as a stream, as opposed to just the entirety of the response, which is what we were doing in the last lesson. So we can see that we're now getting back the responses. So first, the agent reasons, that, you know, the user wants to know the agent ID, and so it calls this get agent ID tool. It gets back this response, which is its own agent ID, and then it sends back a message to the user telling us what its agent ID was. And you can see at the end, that we also print out this usage statistics message, because we are getting a streaming response that gives us the usage information like the token usage and also the step count. So now we've gone over our given an example of how to access agent state through tools. And so now we can try defining a more customized form of in-context memory management by customizing the memory management format. And also tools. In this last section we're going to create a custom task queue memory. So this is going to look a little bit different than the memory that we had before where we had a human and persona that were edited by the default core memory replace and core memory append tools. So we're going to create custom memory management tools, and then also kind of customize the format of the actual in-context memory. So the first memory management tool we're going to define is task queue push. So, instead of having human and persona sections we're instead going to have the in-context memory be a list. So this is going to take in the special agent state, argument, and then also a task description string. And in order to make sure this tool gets created properly, we have to provide a properly formatted docstring. So this will have the description. Push to a task queue sort into our memory and then we also have to define the arguments. So like I mentioned before, we don't need to actually define agent state. The system will we'll know to pass that in. The LLM doesn't need to know about agent State. And so we just tell the LLM in this docstring about task description, which is a description of the next task to accomplish. And then, the return is just null, because we won't return back a response for this function. And so what actually happens for the execution of task queue push is that we're going to actually add a new task into the core memory. We're going to represent as a list. And the way we could do this is that we can actually connect to the Letta client from inside of this tool. And so what we can do is we can do all the same things that we would do with the client from inside of this tool. So it's a little confusing. The agent can also itself access the client through tools. So by creating this client we can then retrieve the current in-context memory block that has the block label tasks. So what we'll create the agent that has this later. And we can basically load in the tasks. Because this block that has the label tasks will also be like a JSON string list. And then append the new task into tasks. And then save back this new list of tasks. By now calling blocks modify. And we have access to the agent state so we can of course, access the ID like we did before and the other tool to basically modify, this agent in-context memory. So that will be the first one. So, the other function we're going to define is task queue pop. So this also has access to the agent state. All it does is pop out the next task from the task queue. So basically removing that from the queue. And then what it's going to return is the remaining tasks inside of the queue. So very similarly we're again going to instantiate an instance of the Letta client in this tool, get the task, task block from the in-context memory. And then we're going to load in the task by loading the JSON string. If there's nothing, inside of the task then we just return none. And then otherwise we take the first task into the queue. And so this kind of grabs that first task, and then we also then update the remaining tasks. So now we just remove the first task from this queue, and get the JSON string from it, and then write that as the new value for the in-context memory block for tasks. And then finally we'll again return these remaining tasks. So now that we've defined both of these functions, we can upsert both of these as tools into Letta. So we'll create the task queue pop tool and then also the task queue push tool. Tools that can be called inside of Letta agents. So now that we've created these tools, we can actually create an agent, that has these tools attached. So we're going to create something called a task agent, which only has access to the task queue pop and task queue push memory management tools. So we're also going to actually override the default system prompt. Since we did change the way that memory is managed, we have a custom system prompt that kind of describes how that's done to the agent. So you can read this if you're interested. And then for the memory blocks, instead of having human and persona sections we're instead just going to have a tasks block. And we're going to initialize that with a value which is just a JSON dumps of an empty list. And so this is a list that we're going to add to or remove things from with our tools. And then we define the LLM model and the embedding model. Attach these tool IDs. And then we're actually also going to disable the default tools. So by passing in false for include base tools. We're actually not going to add all the default tools like the core memory or archival memory and conversation search tools. We're only going to have the task queue-related tools, but we will also add the send message tool. So we do want this included. Just because otherwise the agent won't be able to return back assistant messages. So now we've created the agent and we can actually see, what tools it has. this agent only has access to these three tools. And then we can also see what's inside of its memory. So basically reading the value of its current, task block and we can see it's just an empty list. So now we can actually message the agent, to start adding some tasks into its task memory. So we can create a streaming message, to this task agent and tell it in two separate tasks, the first start calling me Charles, and then tell me a haiku about my name and add these as two different tasks. So we can see that we start getting the responses stream back. And so, it reasons that the user wants to create two tasks. It calls task queue push on the first task, which is start calling me Charles. Then it calls task queue push on the second task, tell me a haiku about my name. And then it realizes that now it has tasks that it needs to accomplish. So, it's actually told in its instructions that it should be clearing its task queue if there are tasks in there. So then it starts to go back and pop items from its task queue. So, it pops the first task. And so now we just have one remaining task in the queue, and then it pops it again until the remaining tasks is empty. And so once it's finally cleared all its tasks, it, returns back a message. Which is "got it. Charles. From now on, I'll call you that." And then here's a haiku about your name. And so it returns back Haiku. "Charles. Strong and wise. Gentle winds whisper your name. Stars light up the night." So there's quite a lot done in this sequence of events. we can actually see we had a step count of five, and a lot of tokens used up, which is why it's often useful to have streaming for these more complex tasks, because we can actually see the progress of the agent over time. So the LLM is a little probabilistic. So you might have seen your agent not actually complete all of its tasks, in which case you can once again message it to complete your tasks. So you might need to do that if your task queue still has items in it. In our case, there was nothing left. The agent did actually get everything done so you can see it return back all tasks are complete so didn't actually run anything. And finally, we can also retrieve the tasks to see if it's empty. And the agent has run through all of its tasks. So we've now implemented a custom form of in-context memory by customizing the memory blocks and also, writing pretty advanced stateful tools that can actually modify the memory of the agent and also access a Letta client from inside of the tool. This is actually a really powerful framework for building stateful agents. So you can create very advanced examples that really control the way that in-context memory is managed and modified over time. Hopefully this simple tasks example gives you inspiration for how you could create advanced ways of managing memory for your own applications.
course detail
DLAI Logo
AI is the new electricity and will transform and improve nearly all areas of human lives.
LearnCode
Next Lesson
LLMs as Operating Systems: Agent Memory
  • Introduction
    Video
    ・
    5 mins
  • Editable memory
    Video with Code Example
    ・
    12 mins
  • Understanding MemGPT
    Video
    ・
    14 mins
  • Building Agents with Memory
    Video with Code Example
    ・
    12 mins
  • Programming Agent Memory
    Video with Code Example
    ・
    14 mins
  • Agentic RAG and External Memory
    Video with Code Example
    ・
    8 mins
  • Multi-agent Orchestration
    Video with Code Example
    ・
    14 mins
  • Conclusion
    Video
    ・
    1 min
  • Appendix - Tips, Help, and Download
    Code Example
    ・
    1 min
  • Course Feedback
  • Community
  • 0%