Hi everyone! I'm Thomas Wolf, I'm the co-founder and Chief Science Officer of Hugging Face. Today, I would like to provide a brief history of agents as a foundation for this course on Code Agents. Now let's dive right in with the elephant in the room question. What is an agent? This question has been debated for months, if not years, and no definition will be perfect. Well, the most rigorous that we found is this one: An AI agent is a program where an AI model outputs impacts the execution flow. In other words, the LLM is embedded as a brain in the computer program as a vehicle. The other LLM decisions then drives the vehicle towards the user expected goal. But the impact of AI models on the execution flow can vary, being agentic is not a clearcut one-zero concept. There are levels of agency. In the first, the lowest level, the LLM output simply controls a decision in a program workflow. At a higher level of agency, the LLM can call an external tool. And yet a higher level of agency, we will see the agent taking decisions on the next step, or the next iteration, or even the stop condition of the program. This is what's called a multi-step workflow. Finally, at the highest level of agency, an agentic workflow can create or start another agent's workflow. That's a multi-level. This notion of levels of agency has been reflected in the evolution of agents capabilities throughout the years since the release of the first LLMs, more and more agency has been unlocked for AI agents, which went from simple routers where the LLM outputs mostly decided to what which branch of an if/else switch to pull a trigger. To tool-calling and multi-step agents and taking a step back the highest level of agents that we can give an LLM is to let them write and execute code. This is what we call code agents, and it's the bread and butter of the smolagents library that will come in details in the next lesson of this short course. But let's continue on our history of agents. The performance of agents existence has been strongly tied to the wave of increasing capabilities for the LLMs that power them. As the performance of generalist LLMs grew, so did their planning ability making them increasingly suited to power multi-step agents with longer time horizons. The graph we're showing here is the evolution of the top model score on the GAIA benchmark. A benchmark co-developed by Hugging Face and Meta, which measures the performance of agents on a wide range of tasks using computer and internet. Most of the tasks in GAIA takes about ten minutes for a human to solve By using a set of standard softwares available in any computer. Predicting future capabilities of AI system is always a challenging exercise, but when we extrapolate this past improvement on GAIA, we see that we could reach human-level performance sometimes in 2026. This would mean that in about 12 to 18 months from now, we would have an agentic systems as efficient as you and I on a large number of tasks that currently it takes about ten minutes for human to complete on a computer, provided, of course, that will give them access to all the programs they need to solve this task. Another interesting trend that I wanted to mention is how agents have a tendency to rely less and less on a specific framework, as well. As the capabilities of LLMs are increasing, they can more and more decide autonomously what next step they should take without any specific scaffolding help or hand-engineer framework. As an example, you can now in many cases load a vision language model, show it a capture of your screen and let you decide where to click or what to type on the keyboard without much guidance. The result is that you get an agent that can operate seamlessly on many user interface, and when integrated in a wide range of systems and agent libraries. Another interesting emerging synergy is with the field of robotics. Given that there is not such a large difference in API between a computer tool and a physical electronic device command, the wide scope of capabilities of multi-modal LLMs means that, just like our agent was able to decide what to do from a screenshot. The LLM controlling robots can more and more decide what next step to take simply on the basis of a video image acquired by its camera. More and more, the model scoring agents are actively trained on a mix of computer interface data and physical world data. Now, let us finish with a few high-level words on how to train agents. Until recently, many teams and companies were pushing for specific fine tuning of LLM so-called function-call fine tuning for agents. This typically consisted in training an LLM to call a set of predefined tools that was given at training time. While this approach trained the LLM to write tool calls in expected formats and with proper arguments, modern LLMs like DeepSeek-R1 or Qwen QWQ are now powerful enough to write tool calls even without any specific training, and just by being given a few examples in their prompts. The many remaining struggle for agent LLMs in early 2025, is the ability to plan and reason over many steps. And this is where reinforcement learning and a recent wave of reasoning models are expected to step in and to be game changers. I expect the continuous improvements of LLMs and the development of agent-specific training methods to propel agentic systems to a new level of performance and reliability, so much that we will see more and more real-life usage of agentic systems. So following this course was a great decision. Now I'll leave you with Aymeric to tell you more.