Meta has introduced the Llama Stack to standardize interfaces to commonly used tools and features. Let's sample a few examples in this lab. Llama models were always intended to work as part of an overall system. Today, applications require multilingual safety models like Llama Guard 3, a prompt injection filter like prompt Guard, and a cybersecurity evaluation suite like Cyber sec evals and agentic systems also require external tools like search and code execution, as well as memory. While building a reference implementation, we realized that having a clean and consistent way to interface between components could be valuable not only for us, but for anyone leveraging Llama models and other components as part of their system. Similarly, we noticed common use patterns in the model lifecycle. Meta releases weights of Llama models to support several use cases. These weights can be improved, fine tuned, and aligned with evaluation and curated datasets to then be deployed for inference to support specific applications. The curated datasets can be produced manually by humans or synthetically by other models, or by leveraging human feedback by collecting usage data from the application itself. This results in a continuous improvement cycle where the model gets better over time. This is the model lifecycle for each of the operations that need to be performed during the model lifecycle, we identified the capabilities as toolchain APIs. Some of these capabilities are primitive operations like inference, while other features like synthetic data generation are composed of multiple capabilities. To support these common usages we release the Llama Stack APIs. It defines two sets of APIs. The first is for agentic systems, which define services like memory shields or orchestrators. The second are toolchain APIs for things like pre-training, inference, synthetic data generation, and evaluation. Let's look at the code. In this lab, we'll give a really brief introduction to some of the components of Lama Stack. Let's start by loading our keys. If you have not installed this, you would have to install the Llama Stack and the Llama Stack client. However, in this lab, this has been done for you. Let's look at these two Llama Stack commands. The first describes some of the available distributions. This is a little hard to read, but the first column describes the providers. Local together indicates that together.AI and the local provider are used. In the providers column, you can see that inference is provided by a remote together implementation, whereas memory is defined as meta reference. This means that it will be using the meta reference code locally. Similarly, safety is provided by together and so on. This command will list the available APIs. This is the list that you saw on the slides. In this lab, we are going to take a brief look at inference and the agent's API. Let's start with inference. First, we will define the distribution we'll be using which is together.AI distribution and the model which is Llama 3.1 8B instruct model. Let's load some packages. Then we'll define a routine. Then run that routine. First, we define the Llama Stack client by describing the distribution that's used. Then we call chat completion with a list of messages and a model. And we set streaming equals to true. This instructs the API to stream back messages as soon as they are available, rather than waiting for the complete message. To print that, we'll use a asynchronous for loop. Let's try this out. Here we can see the answers to our questions. Let's add a print to see how this has been returned from the API interface. Here you can see the result of setting streaming equals to true. The API is returning the message one token at a time. Now let's look at the Llama Stack agent. In Llama Stack, agents have a lot of capabilities. They can remember message history. They can store information in external memory. They can make use of external tools such as search or code execution. In this example, we are just going to use the message history. Let's load our packages. Let's define an agent. Start by initializing it. We'll provide the together distribution. Then let's define a routine to create an agent. Then we'll call agents dot create with an agent config. Let's look at that briefly. And agent config will describe the model. Provide some system instructions and also enable session persistence. Here we are setting this to false. Session persistence allows session state to be available across server restarts. It does this by saving session data to persistent storage. We'll save our agent ID and create a session. A session is state associated with multiple turns that the model is reasoning over. Then we'll save the session ID. Execute turn uses the agent ID and session ID to ask Llama Stack remote server to use the specified Llama model to answer the user's question. Let's create an agent with this config and provide it prompts and ask it to respond. So we can see it understood the first question "who wrote the book Charlotte's Web?" It was written by E.B. White. Then, when responding to follow-up question three best quotes, it responded with quotes from that same book indicating it retained its message memory. Now let's build an agent for a vision model. Let's start by displaying an image. Let's use the Llama image. Then we'll define code to encode the image as we have seen before. Then we'll define an agent. Create a client. As before, create an agent as before. In execute turn, we also encode the image and then in the messages we include the image in the content. Now let's define a routine to create an agent and execute prompts and print the result. Now let's run it. Let's provide the image of the Llamas and a question "How many different colors are those Llamas? What are those colors?" There are three different colors of Llamas in the image white, purple and blue. The blue color is on Llamas party hat. This was a very quick look at Llama Stack. We took a very cursory look at two of the APIs the inference API and the agent API. What I hope you will take away from this lesson are two things: One, that Llama Stack APIs have been defined, and Meta and other providers are working to implement the services behind those interfaces. And two, using these APIs can make your job simpler and allow your applications to be more resilient to changes in the underlying implementations.