In this last lesson, we'll tie together all the concepts you've learned so far. You will understand basic agent architectures, and then we'll demonstrate Claude's computer use capabilities that you can run yourself on your own computer. Let's go. So in this video we're going to tie everything together. Almost everything we've learned together into a single-use case that we've been building up to, which is a computer-using agent. Now, the first thing that you should know before we go any further is that in order to run this agent, it requires a few steps. It's relatively straightforward, but it has to be done on your local machine. It is not considered part of this course. This is purely a demonstration for those of you who are interested in pursuing this on your own. Anthropic has an Anthropic quickstart repository on GitHub that includes a computer use demo. Now this computer use demo is just one implementation to get up and running with a computer using agent relatively quickly, pretty much painlessly. All you need is an API key. You clone the repo and you run a few basic commands. There's other quick starts in here, but this is the one that is most relevant to computer use. I'll begin by cloning the repository, which includes multiple quick starts. I'll cd into the Anthropic quickstarts and I'll cd into the computer use demo. Next, I'll go back to the read me and run this line. It's a little bit of a long line. Copy this. It assumes that we have an Anthropic API key as an environment variable, which I've already set up. I'll paste this in. Hit enter and wait a little bit for the quickstart repo to do its magic. Now it's up and running. We'll visit localhost 8080 in the browser. And this is the quickstart. Again there are many different implementations. You can write your own implementation of a computer using agent, but this is the quickest way to get up and running. And of course you can modify it. What you'll see is on the left a chat interface where you can type a message to Claude, and on the right, we can look at this containerized computer that Claude will be able to interact with. At the moment I can't interact with it. You can see that it's simple Linux machine. I have different icons down here in the dock, but if I click toggle Screen Control and turn it onto on, I can now interact and open up Firefox. If there's certain things I want to do as a human. Before handing things over to Claude. But I'll go ahead and hit off just so it's clear that I'm not controlling this, and we can try something very simple. Before I do that, there are various parameters that you can enter here. You can set a maximum number of images. You can decide if you were using our Anthropic first party API or AWS bedrock Google Vertex. You can also, change the custom system prompt suffix. I'll just minimize this and we'll just do a very simple example. How about: "Find Anthropic's recent research paper on alignment making. And summarize it for me." Again very very simple. We send this off to the model, and it gets to work. So what you'll notice is a lot of tool use. On the left, we can see the tools the model wants to use. And we've covered a little bit of this introduced some of the concepts around a computer use tool, a screenshot tool. Tools to move the mouse, to left click, to right click, and so on. And on the right you can see the various moves or choices the model is making on the display. So it's searched for Anthropic alignment, faking research paper. Now it's clicked on this research paper. It's opened up the PDF. Now it's going to download this. So it's using curl as you can see on the left side here. This log that's showing us the various tools it's using. Okay. Looks like it downloaded. Now it's checking using a bash tool. All right. So this is some of the content that it downloaded. Now it's going to hopefully summarize the content of this for me. And we get a nice summary here. Okay. Again a very very simple example. But what I want to highlight is from the initial prompt all the way to the end goal here of a summary, there is maybe 10, 15 back-and-forth messages where this model is acting agentically in a loop. A very, very simple agent that is simply attempting to accomplish a goal. In this case, find Anthropic recent research paper on alignment tracking and summarize it. To do that, the model had access to a few different tools. This is the main agentic loop that calls the Anthropic API and provides it with our computer use tools. So it's a little bit more complicated than the demos you've seen so far, but there's a long prompt in here that explains the model, but it's using an Ubuntu virtual machine. It can open Firefox and it can install applications that it has access to various tools. It tells it the current date. And then if you keep scrolling down, what do we see? A pretty straightforward collection of tools. These tools are defined elsewhere, but they're the exact same type of tools that we've talked about. The same structure. They just happen to be maybe a little more interesting, especially the computer tool. There's some prompt caching involved here. And then further down we have messages that are being sent off to the model in a loop. So over and over and over until the model essentially decides I'm done. And if you look closely, there's logic in here that decides what to do when the model calls a tool. We execute the tool. We respond back with the correct tool format, all the things that we've covered previously. And in the repository there's a folder called tools. It includes a handful of tools we won't call for all of them, but let's look at computer. The computer tool is a tool that does things like types keys, types the letter S, or hits the enter key, moves the mouse, left clicks, right clicks, middle clicks, double clicks, and importantly takes screenshots. The underpinnings of this whole thing, it all depends on screenshots. The model requests the screenshot to get the current state of the screen, and then it decides where to move the mouse, where to type, where to click, and then it might get another screenshot and it keeps going. But it's all based on screenshots. And there's more logic in here. As I mentioned, some of it is a little bit complicated in the sense that screenshots need to be scaled down to the sort of ideal resolution that works for our Claude models. But at the end of the day, this is just a function that takes the model's request. It wants to left-click. It wants a screenshot. It wants to double-click or it wants to move the mouse. And then it actually implements that functionality. It moves the mouse. It clicks. And the code in this file just does those operations. Remember the model itself is not executing tools, just like with the simple chatbot example. The model output a little block that said, hey, I'd like to call this tool. It's the exact same thing here. We the engineers, the developers, and if you're using this quickstart repo, it's written for, you have to actually implement the clicking, the dragging, the screenshotting. The model is simply telling us it would like to take actions. So we'll zoom in a little bit on this log here. The model starts by outputting some text saying I'll help you do that. It says let me use Firefox and then it asks for a screenshot. So it outputs the tool block saying I'd like to use the screenshot tool. Then we provide a screenshot back of the current state. This is what it looks like at that point in time. Then the model decides based on this screenshot, it sees where the Firefox icon is. It decides to move the mouse to that location. So you can see the mouse is now there in the screenshot. And then it outputs a left-click tool use block. It wants to left-click. That's one of the tools it can use. It left clicks, Firefox opens and this process repeats. It gets a screenshot. It decides it needs to type into this nav bar. So it wants to move the mouse to the nav bar. And on and on and on until eventually it ended up finding the research paper, downloading it, summarizing it, and then giving us this nice summary at the end. Now, if you don't believe me that this is the exact same fundamental underpinnings of everything you've seen so far in this course, if we click on this Http Exchange Logs tab, I'll scroll down towards one of these at the bottom here. This is a full log of the entire conversation where I'll scroll up a bit, quite a bit. You can see every turn in this conversation, including our initial turn. So this has a role of user. "I'd like you to find Anthropic's research paper." We can see the assistance response. It has a response, some text. And then what do you know? We have a tool block, right? Type equals tool use. And then we respond as an engineer with a tool result. Of course, the tool use ID has to match as we learned when we covered tool use. Also, we covered multimodal prompting providing a screenshot. Here it is. So a content block with type of image we have type base64. The media type is a png. Here's the data. This should all look relatively familiar, obviously in a slightly different context. And that's a user message. And we go over and over and over, right? The model then outputs a mouse move tool use block. And then here's the tool result that corresponds. And this process repeats and repeats and repeats. So it's fancier. It's far more complicated than a simple chatbot. But underpinning it all is sending messages with the correct role. The correct types of content, images and text. Also tool use of course, providing the model with tools, responding back with the correct tool result blocks to tell the model: Here's the result of the tool you issued. There's also prompt caching involved and other techniques that we've covered, but really it's a nice summary of almost every single topic that we've touched on in this course. So again, this is a demonstration. This is not something that I'm expecting you to go do right now. If you're curious, if you're interested, you can do this on your own machine. Just go to the quickstart repository. It's fun to play around with. You can check out a bunch of information on our documentation and blog about how to get the most out of computer use and it's just a nice sort of capstone here that combines everything we've learned. So that's a simple demonstration of computer use.