Welcome to Building Toward Computer Use with Anthropic. Built in partnership with Anthropic and taught by Colt Steele, whose Anthropic's Head of Curriculum. Welcome, Colt. Thanks, Andrew. I'm delighted to have the opportunity to share this course with all of you. Anthropic made a recent breakthrough and released a model that could use a computer. That is, it can look at the screen, a computer usually running in a virtual machine, take a screenshot and generate mouse clicks or keystrokes in sequence to execute some tasks, such as search the web using a browser and download an image, and so on. This computer use capability is built by using many features of large language models in combination, including their ability to process an image, such as to understand what's happening in a screenshot, or to use tools that generate mouse clicks and keystrokes. And these are wrapped in an iterative agent workflow to then carry out complex tasks by taking many actions on that computer. In this course, you learn about the individual features which will be useful for your applications even outside of LLM-based computer use, as well as see how we can all come together for computer use. And Colt will show you how all this works. Thanks, Andrew. In this course, you will learn how to use many of the models and features that all combine to enable computer use. So here's how the course will progress. You'll first learn a little bit about Anthropic's background and vision and what's unique about our family of models. Then we'll use the API to make some basic requests. This then leads to multi-modal requests, where you'll use the model to analyze images. Then you'll dive into prompting, which Anthropic has really leaned into making models much more predictable with solid prompting, you'll learn about the prompting tips that actually matter, things like chain of thought and n-shot prompting, as well as get a chance to use our prompt improver tools. Recently, large language models have been supporting large input contexts. Anthropic's Claude, for example, supports over 200,000 input tokens, which is more than 500 pages of text. Long inputs can be expensive to process, and that any long conversations with chatbot if you're processing that conversation history over and over to keep on generating that next response, that next response, then that too gets more expensive as that history gets longer as the conversation goes on. Exactly. And that brings us right to prompt caching. Prompt caching retains some of the results of processing prompts between invocation to the model, which can be a large cost and latency saver. You also get to use the model to generate calls to external tools and produce structured output, such as Json, and at the very end, we'll walk through a complete example of computer use that you can run on your own machine. Note that because of the nature of the tool, you will have to run that on a Docker image on your computer, rather than directly in the DeepLearning.AI notebook. I've tried out Computer use myself using Anthropic's models and found it really cool. And I think this capability will make possible a lot of new applications where you can build an AI assistant to use a computer to carry out tasks for you. Kind of think RPA or robotic process automation, which has been good at repetitive tasks but now easier to build and more general with LLM-based tools. Or as this technology is even better than even more flexible and more open-ended tasks. So gradually feel more and more like personal assistants. I could not agree more. Very excited to see where it goes. Many people have worked to create this course. I'd like to thank from Anthropic, Ben Mann, Maggie Vo, Kevin Garcia, the team working on computer use, and from DeepLearning.AI Geoff Ladwig and Esmaeil Gargari. Anthropic has built a lot of really great models, and I regularly use them myself. Colt will share details of these models in the next video. All right, let's get started.

Please sign in to view this content

Next Lesson

Building toward Computer Use with Anthropic

Introduction
Video
・
3 mins

Overview
Video
・
7 mins

Working with the API
Video with Code Example
・
15 mins

Multimodal Requests
Video with Code Example
・
12 mins

Real World Prompting
Video with Code Example
・
17 mins

Prompt Caching
Video with Code Example
・
12 mins

Tool Use
Video with Code Example
・
17 mins

Computer Use
Video
・
10 mins

Conclusion
Video
・

Appendix – Tips and Help
Code Example
・

Course Feedback

Community