Now that you've seen the agent in action more deeply, let's learn about the few mental models that we can use to dissect and understand how the agent works under the hood. Let's dive in. The main purpose of this lesson is to understand the components of what makes Cascade this collaborative agent really powerful, independent of all of the UX and pretty features that come with it. Many people have likely seen some version of this classic agent loop. Essentially, the agentic system will get some input prompt from the developer. It will go to some large language model that acts like a brain to essentially take that input prompt and reason out of all of these different tools that I as an agent have, which tool should I use? This could be a search tool such as grep or embedding search. It could be a tool to edit a file. It could be a tool to suggest a terminal command. You saw all these tools in the demo. And then after the tool takes some actions, the large language model will again reason, okay, given the inputs and the tool that was just called, what should my next action be? Should it be calling another tool? In which case it will continue this loop until finally that large language model that acts as a brain, decides that, hey, we're done calling tools. It's time to go to the end state and my action can go back to the user. This is the classic agent loop, where the large language model is used as a reasoning agent for tool calling. And it's really clear then what exists as two important components of an agentic system. The first are the tools. What are all the actions that can be taken, and how powerful are those tools for each of those steps. And the second half is the reasoning model. And today, the state of the art for deciding what tools to call and how to call them the inputs to those tools, given all the relevant information that the large language model has, are these generally available foundation models from model providers such as OpenAI and Anthropic. But these don't actually capture the full set of axes that you could use to improve an agentic experience like Cascade. The first one that I want to mention is the idea of context awareness. We touched upon this a little bit earlier when we talked about how improving access to knowledge can improve and ground the results from AI systems, but there are a few different ways to think about context awareness of the problem. The first are the sources. What are all the relevant sources of knowledge for the task at hand? For something like writing new code, that of course includes existing code in private repositories, but it might include things like documentation or tickets or slack. These are all various sources of information that could help ground the agent in its decision-making process. The other half is parsing. It's great to have access to data, but how are you actually reasoning over that data? Is there structure or implicit information of that knowledge that can actually improve the ability to retrieve relevant information from large corpuses of information? And we will talk about this a little bit more in detail in a couple of slides. And finally is access. This is a little bit less from improving the quality, but an access of context awareness to be aware of, especially when we're thinking about large organizations that might have access controls over certain pieces of knowledge. AI should not give access to knowledge that would be otherwise unavailable to a particular user. The other component of an agentic system that isn't fully captured by that agent loop, is the idea of human actions understanding the actions that the developer is taking, implicitly allows the agentic system to understand what needs to be done. This really allows for that collaborative agent experience or that flow-like experience. If context awareness is pulling all of their relevant explicit knowledge. The idea of tracking whether a developer opens a file or performs some navigation or IDE, or makes an edit in a file, all that implicit information that we get from those actions being taken can also be used as inputs into the LLM to reason about what tool needs to be called next, or whether we're done calling tools. We can actually see the value of the combination of context awareness and human action tracking by looking at other AI modalities that aren't just an agentic system. For example, we could look at the classic autocomplete functionality, where the AI suggests a few lines of code at the cursor position to complete so that you, as a developer, don't have to type it out from scratch. This helps a lot in writing our boilerplate. This graph shows a series of different experiments where we change the levels of context, awareness and human action tracking that the model has access to. So in all of these examples, the model is exactly the same. All we're changing is the kinds of inputs that we're providing to it. We'll take the baseline performance of autocomplete to be using just the currently open file as context to the LLM to make the autocomplete suggestion. If you start incorporating intent, which it may include what other files and tabs are open in the IDE. The quality of the autocomplete results actually increases by 11%. That's very clearly shows that by incorporating intent and human action tracking, you can create better results even with the same models. If we try another experiment, for example, let's take embeddings of the entire code base, naively chunk it, and then do simple embedding based retrieval across the whole repository it's also better than the baseline. But one thing to notice, and might be a little bit counterintuitive, is that it actually performs worse than if you only use the intent of the open files and tabs. Which actually points really importantly to the parsing capability of the context awareness system. Because if you change the parsing from just naive chunking and simple retrieval just based on embeddings to a system where you use things such as abstract syntax tree parsing, AST parsing, custom code parsers, smarter chunking, and more advanced retrieval where we're not just looking at embedding based retrieval, but looking at heuristics and other structure of the code base, such as imports or nearby files, if that's the change that you make, you can actually massively increase the performance of your system over the baseline. So this is really just to highlight how important not just the tools and the model are to an agentic system, but also how important context awareness and understanding the developer's intent is as well. So in totality, context awareness tools and human actions become the axes in which differently genetics experiences differentiate from each other. Especially since the reasoning models are relatively the same between tool to tool. Of course, it is also important to talk about the tools a little bit, because the tools are the actions that the agent can take in between the reasoning steps. And if you have higher quality tools, you'll be able to take higher quality actions and therefore get to better results quicker. This will be three main categories of tools, and this is another mental model of dissecting what goes into an agentic system. The first is the category of search and discovery tools. The idea that you first kind of need to get all the relevant information to even make changes in the first place. The second are tools that allow us to change state of the world. And the final is verification to check that any changes to the state actually improve the overall system and call us closer to the task at hand. If you really think about it, this is kind of how humans also do work, right? When we start doing a task, we look for all the relevant pieces of information, either in our code base or online or elsewhere. We'll make some changes and then we will compile a code, we'll run our code and we'll look at results to see is it exactly what we wanted it to look like. And if not, we start the process all over again. And so in a very similar way, we can think about tools for an agent within these categories. So putting it all back together again really it's this combination that makes this kind of collaborative agentic experience feel very, very natural. Like say you want to make a modification, you made a modification to a class, and now you want to make that same modification to other classes in the same directory. This is relatively a boilerplate task, but if there was a peer programmer or a human peer programmer right next to me who has been observing the work that I've been doing, I could just tell them, hey, do the same thing in similar places in this directory, and thinking about the collaborative agent again as a peer programmer, you can see how these different components help assist in making this possible. First, the idea that the agent has been observing what the developers have been doing will make it able to understand that, quote, same thing refers to the recent edit that I had just made. Access to tools will allow the agent to find the relevant files in this directory, and then later use edit-like capabilities to make the changes. And the context awareness will allow the agent to reason whether or not classes are actually entities that are in a, quote, similar place as the class that I just modified. So it requires all three of these components coming together to create this experience working with an agent that would be similar to working with a peer programmer as if it was a human. So just some final takeaways. Context awareness, brings an explicit knowledge. Human actions will bring in the implicit intent. The tools will then take actions. Combining this explicit knowledge the implicit intent. And it will do this across search and discovery, making changes to state and verification. And then at the end, the LLM is used for combining all these to choose what are the right tool calls to be making at the appropriate times. But this is common across different agentic systems today.

Please sign in to view this content

Next Lesson

Build Apps with Windsurf’s AI Coding Agents

Introduction
Video
・
2 mins

Getting Started & First App
Video
・
5 mins

AI Code Assistants 101
Video
・
6 mins

Fixing Tests Automatically
Video
・
3 mins

How a Collaborative AI Code Agent Works
Video
・
9 mins

Search & Discovery for AI Agents
Video
・
9 mins

Understanding Large Codebases
Video
・
4 mins

Wikipedia Analysis App – Data Analysis
Video
・
8 mins

Wikipedia Analysis App – Caching
Video
・
4 mins

Wikipedia Analysis App – Fullstack App
Video
・
10 mins

Wikipedia Analysis App – Polish
Video
・
5 mins

Conclusion
Video
・
1 min

Appendix – Prompts and Repos
Code Example
・
10 mins

Appendix – Tips and Help
Code Example
・
10 mins

Course Feedback

Community