In this lesson, we'll jump right into what web agents are and how they can be used to automate tasks online, like purchasing an item on an e-commerce website. We will also discuss common challenges such as reliability issues, compounding errors, and agents getting stuck in loops. Let's go. First, let's see a real-world example. Imagine you want to purchase books online. But don't have time to search and check out. In this demo, we have asked our web agent to order two books that are "Radical Candor" and "The Cold Start Problem", and add them to our cart on Amazon. Here, we notice how the agent is navigating to Amazon just like a human would. It's intentionally searching for the first book. Once it finds the book and proceeds to add it to the cart. and proceeds to add it to the cart. Next, it goes and starts searching for the second book. And then also proceeds and adds it to the cart. And here we see the agent exactly, in a sense, where to click And how to navigate the website. Finally, the agent can go and place the order. As you can see, the agent successfully found both books and added them to the cart without any human intervention. This is just one example of what web agents can do. Imagine all the possibilities. These agents could automate tasks across virtually any website on the internet. A web agent is an autonomous software program that performs on a task automatically on our behalf. These intelligent agents having created various purposes such as automating repetitive online tasks, gathering data from multiple sources or monitoring websites, for changes or updates. Web agents can power complex applications in our daily digital lives. This range from an online search that goes beyond simple queries, e-commerce assistance for finding products and completing purchases, social media management including monitoring and automated posting, data analysis across multiple platforms, customer support automation, cloud booking and planning, as well as specialized applications in finance, healthcare and many other industries. Now let's see the key components for building web agents. Here we explore the architecture on how these agents work. A well designed web agent consists of five essential modules. The user interface module, which allows people to communicate with the agent using natural language. The control module, which acts as the brain of the system and hence reasoning and decision making for actions. The third is the knowledge base where the agents stores important data, rules, and information needed to complete tasks. The fourth is a communication module that manages interactions with websites, API's and other systems. And finally, the data processing module, which analyzes, processes and transforms data before returning the results. Within these modules, several specialized components work together. First, it parsers that can systematically extract website data, and interpret HTML. Second, action models that through decision making and predict actions to take. Third, executors that execute specific actions on the website, such as clicks, form filling, etc... Here's how the entire process flows. First, a human provides a request to the agent. The agent understands the user's request. It personalizes its approach based on the context. The action model takes of the representation of the website and determines what actions to take on the webpage. Next, the model predicts what is the optimal action in the setting. In the second, this action is then executed on the website. The cycle repeats until the task is complete. Let's see an example. Here is a screenshot of the DeepLearning.AI website first page. You are going to build an agent that processes both visual information like screenshots, as well as structural function. That is the HTML DOM representation of the website, that can navigate to find courses on DeepLearning.AI and do a website interactions. Now we explore how agents can process the complex structure to understand HTML elements, web agents can interact with virtually any HTML element on a webpage. This can range from links to Textarea finding information, checkboxes, for making selections, radio buttons for choosing options, Dropdown menus for selecting from lists Buttons for submitting forms or triggering actions As well as reset button fields. To understand the limitations of existing frameworks, let's examine the key steps an agent follows. First, planning. Determining what actions to take. Second, reasoning. Making decisions based on the available information. Third, environmental actions. Executing the planned actions. Finally, explanations. That is communicating to the user what was done and why. There are many issues that occurred during this reasoning phase. If an agent cannot perform a reasonable plan, it may make incorrect decisions. This is why improving reasoning is a critical focus in later lessons. Let's see the main limitations of existing autonomous agent frameworks. First. Reliability and trust challenges. Ensuring automation systems can be trusted with mechanisms for human rights. Second, decision making errors where there is compounding mistakes and tradeoffs between exploration and exploitation. And finally, plan divergence and looping. Whether there are risks of breaking cycles or divergence plan during agent execution. First, it's very hard to establish reliability and trust. With current autonomous systems, they are stochastic and things can go wrong. When it comes to decision-making errors, agents suffer from compounding mistakes. That is a cascade of errors that can grow over time. Early errors can snowball into larger problems as they influence subsequent decisions. Lack of self-corrections. Means agents can identify and fix their own mistakes. This can lead to catastrophic failures as errors accumulate and magnify. Limited context. Agents may miss critical information while making decisions without complete context, choices become increasingly flawed. Inconsistent reasoning decisions based on incomplete natural processes leads to empathetic behavior and unreliable outcomes. Biased problems. Which might be present in training data can be amplified in agent decisions. Over time, this creates increasingly skewed or unfair results. Together, these issues significantly undermine the decision-making capabilities of autonomous agents. Web agents constantly face a critical decision-making challenge. Should they exploit what they already know? Or should they explore new possibilities? These trade-offs fundamentally shape how agents navigate websites. In the exploitation strategy, agents focus on maximizing rewards. They follow single path deeply before considering alternatives. They explore the children elements in the DOM tree and use recursion to navigate deeper into the promising paths. The advantages of exploitation are that it maximizes the current rewards by leveraging known successful strategies. It allows efficient use of resources on paths with non-proven values that gives more predictable performance in familiar environments. It also has a lot of disadvantages where you can miss alternative paths. The agents can be locked into the current strategy. They might be unable to adapt if the chosen path is suboptimal. Now we examine the alternative approach, exploration. In exploration, agents use a breadth first search strategy. The agent tries new untested paths to discover potentially better rewards and strategies. Rather than going deep, it goes and explores all possible directions. As shown in the diagram, the agent examines all child elements at each level before going deeper. When a promising part is found, the agent continues exploration. If a path doesn't lead to the desired state the agent can backtrack and it can try alternatives. This creates a broader search pattern across the whole website structure. It has advantages that can unlock new opportunities of discovering higher rewards or better paths that were not previously known. It can prevent stagnation, reduces the risk of being stuck in suboptimal strategies, and it enables adaptation to changing environments or requirements. It also has a couple of disadvantages, such as it might have more immediate cost. Exploration may not yield immediate rewards and could even lead to worse outcomes. It can lead to the risk of extra time. Pursuing unknown paths can result in efficient use of resources and time, and also increasing complexity in decision-making processes. The critical challenge for web agents is determining when explore vs when to exploit. Choosing incorrectly between strategies is a common source of decision making errors. The ideal approach often involves balancing both strategies proactively based on context. Now, we examine a critical challenge facing automizations. When they stray away from their intended path. Among the limitations in current frameworks, we'll now focus on plan divergence and looping behaviors. Plan divergence happens when an agent veers of course. In this diagram, you can see the ideal path as a straight line. But the actual path diverse significantly. Like asking an assistant to summarize a topic but receiving unrelated information instead. Even advanced AI agents lack the ability to self-correct when they make mistakes. Once an agent deviates or gets stuck, recovery becomes extremely difficult due to three key factors. First, agents typically have limited domain knowledge, they are trained on general information but struggle with specialized tasks. Second, unexpected changes in the environment can confuse an agent causing it to repeat ineffective actions rather than adapt. Finally, insufficient training data makes it difficult for agents to navigate unfamiliar situations, and recognize when they should backtrack. In this lesson, we explored what agents are. What are the key components in building them. And the main issues. In the next lessons, you will learn how to address these fundamental challenges.

Please sign in to view this content

Next Lesson

Building AI Browser Agents

Introduction
Video
・
2 mins

Intro to Web Agents
Video
・
11 mins

Building a Simple Web Agent
Video with Code Example
・
7 mins

Building an Autonomous Web Agent
Video with Code Example
・
9 mins

Agent Q
Video
・
8 mins

Deep Dive into AgentQ and MCTS
Video with Code Example
・
9 mins

Future of AI Agents
Video
・
5 mins

Conclusion
Video
・
1 min

Appendix – Tips and Help
Code Example
・
10 mins

Course Feedback

Community