In this lesson, you'll learn about the rules supported by the Llama family and the new prompt format that supports them. Let's take a look. The Llama models use roles to help identify information in the prompt. The Llama 3.1 and 3.2 models have four roles. Information in the system role often includes rules, guidelines, or information that helps the model respond effectively. For example, you are a helpful assistant who only speaks French. Now let's look at the user role. The user role is where user input is included, which is your prompt. The iPython role is effectively a tool role. It is not just Python. We will cover this in detail in a later lesson. And the final assistant role identifies the model's response. The prompt format has been enhanced with special tokens to identify these roles. Let's look at these in more detail. The beginning of a text token starts all prompts. The start and end header tokens bracket a role such as user or assistant, and the end-of-turn token flags, an end of section, or a turn of a conversation. Let's see these in a prompt. Here at the beginning of the text, token starts off the prompt. The start and end header frames the role of a user. Then the user input, which we see here, "Who wrote the book Charlotte's Web?" is followed by the end of the term. Then start and end headers identify the assistant role. That signals the model that it is now it's done to respond. Now, let's look at tool calling. Tool calling adds tokens to delineate tool messages and identify tool responses. We'll discuss more in a lesson later. There is a pad token for fine tuning an end-of-text token sent by a base model to indicate it won't send any more text. Now, I have been talking about tokens, but you're seeing strings like end-of-text. Well, as we will describe more in a later lesson, the input to large language models are tokens. These are set of integers that represent words or pieces of words. Some of these integers have been reserved for special tasks, like the ones I just described. A piece of software called a tokenizer will convert these strings into the integers, which are reserved for special tokens. Let's code some of these up. All right. Here's the prompt you saw on the slides. We'll ask the question: "Who wrote the book Charlotte's Web?" Let's look at the prompt. It has a beginning of text token. It has a user header which is the word user framed by the start header token and the end as a token. The question and then end of turn token and then an assistant header. Let's take a look at that prompt. It's a long string with all these fields concatenated. Now we are going to want to send this to the model. We'll do that with this routine Llama31 This is in file utils.py which you can find by clicking file and then open. But I put a copy in this file to take a look at. It starts by receiving a prompt or a list of messages, which we'll cover in a few minutes. It's going to build a prompt to send to this provider. They have GPUs with this model loaded in memory just waiting for your request to come in. They will receive this request, convert it to tokens and execute it and send you the results. The request includes the payload with the model temperature and prompt, and an Http header that includes the key for the provider. It will send that request to this URL with those headers and that payload and receive a response. The response has many fields, but the field we are most often interested in is the text field. All right. That's a quick overview. Now let's try this out. The question who wrote the book Charlotte's Web. The book Charlotte's Web was written by E.B. White. Now, you may be curious about other fields that we are not seeing. Let's take a quick look at those. This is the full response. There's the same text field we have seen. Other interesting fields are the usage field. So this field can be useful in calculating cost. Model providers typically charge in dollars per million tokens and have different costs for input and output tokens. Prompt tokens are the input tokens and completion tokens are the output tokens. Now let's look at the multi-tone chat. Let's ask the follow-up question: "Three best quotes in it." Now we haven't told the model what it is and models have no memory. So to simulate memory you have to provide the model all of the past conversation. And here's how you do it. We have a similar beginning of text token. And then you repeat all the past conversation including all of the headers. This is the start header we previously sent in our question. Including the past headers, allows the model to understand what you had asked and what it had responded. And then you add the new question down here and it responds. "Here are three of the most famous and meaningful quotes from Charlotte's Web." So it does remember the book that we were talking about. The model likes to respond in markdown. Let's use display to view this. There, same response, but it looks much nicer. If you're having an extended conversation, then maybe a set of instructions you want to give to the LLM that will be persistent throughout the conversation. So you can use the system rule for that. Here's a question: "Three great quotes." And in our system message, you are an expert in quotes about sports. You provide just the quotes and no commentary. Let's keep it brief and apply in markdown. The system message is appended like this. Let's try it. Here's one of the quotes from Vince Lombardi. "It's not whether you knock down, it's whether you get up." Similarly, you can ask for three more quotes and you can include the past history as we had done before. Now, I know you are anxious to get started writing some prompts, but we don't typically write prompts in these verbose fashion. There's another format, a higher-level format that is easier to write. Here's the messaging format. Here you have a role as this would be user or assistant or system and then content. This is our question. "Who wrote the book Charlotte's Web?" This is equivalent to the prompt above. Now let's try both of them. We can follow up with another question about what's the three best quotes and an extended conversation looks like this. With each turn of the conversation being a dictionary with role and content. "You have been my friend. That in itself is a tremendous thing." Now, we don't want to have to type out all these messages each time. So let's write a little routine just for this lab to keep track of our conversations. We'll make it a class. We'll take in a system message. We'll track our messages in a list. If there is a system message, we'll start our messages with that. Now let's generate a response. So we'll take in a user question. Then the question to our messages. Send the system message and any past history and the new question to the model. And then append its response to our message list. Let's try this out. Let's leave a system message. "You are a terse expert in children's literature." We'll initialize our conversation with that and then ask the question, "Who wrote the book Charlotte's Web?" That's our expected result. Then we can follow up with another question to make sure history is working. "Three quotes from Charlotte's Web." And you can look at the past history with the messages. Here's all the messages we sent. All right. Now we can try some prompts. Did you know the model knows eight different languages? Let's ask for three basic phrases in eight languages. I was going to leave you with a nice user interface to try some prompts in, but it looks like I didn't implement it. Well, maybe we can just write one quickly here. Maybe we'll get the model to help us out. Okay, so I'll write the Python script for grader Chat bot app. And here's the class. So initialize the class. And I'll also make sure the Llama function is there. Yeah. And but in the system message. And then call the conversation print. Okay. The class conversation is already defined. I don't need it. So I'll go back to my prompt and change it. Since it's already defined. Don't redefine it. Okay. Okay. So. It looks good. Okay, now let's run it. We have a literature expert chatbot. Let's put in a question. "Who is P.G. Wodehouse?" Select The model the 8B and submit it. Jeeves and... Okay, that looks good. Let's try another question. "Who is his most famous character?" Okay, that doesn't seem that right. So let's select the 405 model and submit it and see. Okay, Jeeves, that makes more sense. All right. Now you have a UI that you can use to try different prompts in. Maybe you want to try in different languages. Or maybe you want to write your own code. The key takeaways for this lesson are the new rules in the Llama 3 family, and the new prompt format that identifies the roles. Okay, we'll see you in the next lesson.