Welcome to "Building AI Browser Agents", built in partnership with AGI Inc. AI browser or AI web agents can log into websites fill out forms, click through web pages, or even place an online order for you. Your AI web agent can use both visual information that is, screenshots and structural information such as the HTML or the Document Object Model (DOM) representation of a web page to reason and take actions. If you open a page on a website and take a look at the code underlying that web page, you see how large the action space can be for the agent at each step. Since these agents can run long sequences of actions automatically, any error can have unintended consequences like paying for the wrong flights or ordering random products, or if the agent misreads a single field, say a product name, it can hit down the wrong path entirely, and these errors can compound quickly. In this course, you'll learn about these problems and several approaches to tackle them. I'm delighted to introduce the instructors Div Garg and Naman Garg, who are the co-founders of AGI Inc. Div, Naman, and their team have built MultiOn, which is a web agent platform that is based on the approach that they published in the AgentQ paper. Thanks Andrew. To address the challenges you mentioned, we have introduced AgentQ. AgentQ combines Monte Carlo Tree Search (or MCTS) with a self critic mechanism and iterative fine-tuning using Direct Preference Optimization, or DPO. During the search process of AgentQ, different branches or sequential actions are explored and the outcomes are evaluated. The simulations combined with their feedback, are used to create preference pairs at each node of the search tree. The DPO algorithm is then used to fine tune the underlying language policy model by learning from this high level preferences. This helps favor actions that lead to better outcomes or are ranked higher by the AI feedback. In this course, you will build several web agents. You will build a simple web agent that analyzes DeepLearning.AI website, and list all the courses on a specific topic. Then you will extend this to taking actions like clicking on a course or summarizing it, and even signing up for a batch newsletter. Next, you will dive deep into MCTS that is an integral part of our AgentQ method and solve a grid world problem of finding the optimal path, we will then explore a variant of AgentQ plus MCTS that takes a course title and searches the web and navigates the result until it finds the right course. You will visualize and analyze different tree paths the agent takes until it accomplishes the goal. Many people have worked to create this course. I'd like to thank Michelle Gee and Milind Maiti from AGI Inc. From DeepLearning.AI, Esmaeil Gargari and Geoff Ladwig have also contributed to this course. The first lesson will be an introduction to AI web agents. That sounds great! Now let's have you, rather than your web agent, get started.

Building AI Browser Agents

Introduction
Video
・
2 mins

Intro to Web Agents
Video
・
11 mins

Building a Simple Web Agent
Video with Code Example
・
7 mins

Building an Autonomous Web Agent
Video with Code Example
・
9 mins

Agent Q
Video
・
8 mins

Deep Dive into AgentQ and MCTS
Video with Code Example
・
9 mins

Future of AI Agents
Video
・
5 mins

Conclusion
Video
・
1 min

Quiz

Graded・Quiz

・

10 mins