So you've seen a lot of material so far in the course, and the best way for you to solidify these concepts in your mind is to get hands-on and try some code for yourself. As Andrew mentioned at the start of this week, my colleague Chris Fregley led the development of the labs for this course. Each week features a lab exercise that lets you try out the key concepts from the videos. Chris is going to help you get started by showing you the lab environment, then he'll walk you through the activity that you'll complete this week. Hey Chris! Hey, thanks Mike. And now let's take a look at the lab environments and Lab 1. So this is Lab 1, and we are going to grab a data set of conversations that are happening between people. What we plan to do is to summarize these dialogues, and so think of a support dialogue between you and your customers. Maybe at the end of the month you want to summarize all of the issues that your customer support team has dealt with that month. Some other things to note now. These are some of the pip installs, and we see that we're going to be using PyTorch. And so here we see Transformers. This is a library from Huggingface, a really cool company who has built a whole lot of open source tooling for large language models. They also have built this library, this Python library called Datasets, that can load in many of the common public data sets that people use to either train models, fine-tune models, or just experiment with. So now we're going to actually do the imports here. So this is going to import functions called LoadDataset. This is going to import some of the models and tokenizers that are needed to accomplish our lab here. We're going to use this data set called DialogSum, and this is a public data set that Transformers, and specifically the Datasets library, does expose and does give us access to. So all we do is call LoadDataset that was imported up above, and we pull in this data set. Now, from here on out, we're going to explore some of the data. We're going to actually try to summarize with just the Flan T5 base model. Okay, so before we get there, though, let me load the data set. Let's take a look at some of the examples of this data set. So here's a sample dialogue between Person 1 and Person 2. So Person 1 says, what time is it, Tom? So it looks like Person 2's name is Tom, actually. Just a minute, it's 10 to 9 by my watch, and on and on. And so here's the baseline human summary. This is what a human has labeled this conversation to be, a summary of that conversation. And now we will try to improve upon that summary by using our model. So again, no model has even been loaded yet. This is purely just the actual data. So here's the conversation, and then think of this like this is the training sample, and then this is what a human has labeled it. And then we will compare the human summary, which is what we're considering to be the baseline, we will compare that to what the model predicts is the summary, right? The model will actually generate a summary. Here's a second example. You can see it's got some familiar terms here that, like a lot of us are familiar with, CD-ROM, painting program for your software. Now, here's where we're actually going to load the model. So Flan T5, we spoke about in the videos. This is a very nice general purpose model that can do a whole lot of tasks, and today we'll be focused on Flan T5's ability to summarize conversations. After loading the model, we have to load the tokenizer. Now, these are all coming from the Hugging Face Transformers library. And so just to give you an example, before Transformers came along, we had to write a lot of this code ourselves. And depending on the type of model, there's now many, many different language models, and some of them do things very differently than some of the other models. And so there was a lot of sort of bespoke, right, like ad hoc libraries out there that were all trying to do similar things. And then Hugging Face came along and really has a very well-optimized implementation of all of these. Now, here's the tokenizer. This is what's going to be used to convert the raw text from our conversation into our vector space that can then be processed by our Flan T5 model. And so just to give you an idea, let's just take a sample sentence here. What time is it, Tom? The first sentence from our conversation up above. We see the encoded sentence is actually these numbers here. And then if you go to decode it, we see that this decodes right back to the original. So the tokenizer's job is to convert raw text into numbers. Those numbers point to a set of vectors, or the embeddings as they're often called, that are then used in mathematical operations like our deep learning, backpropagation, our linear algebra, all that fun stuff. All right. Now, let's run this cell here and continue to explore. Okay, now that we've loaded our model and we've loaded our tokenizer, we can run through some of these conversations through the Flan T5 model and see what does this model actually generate as a summary for these conversations. And so here again, we have the conversation. Here again is the baseline summary. And then we see without any prompt engineering at all, just taking the actual conversation, passing it to our Flan T5 model, it doesn't do a very good job summarizing. We see it's 10 to 9. That's not very helpful. There's some more details in this conversation that are not coming out at this point. Same with the conversation about our CD-ROM. Baseline summary is person 1 teaches person 2 how to upgrade the software and hardware in person 2's system. The model generated person 1 is thinking about upgrading their computer. So again, lots of details in this original conversation that do not come through these summaries. So let's see how we can improve on this. In the lesson, you learned how to use instructions to tell your model what you're trying to do with the data that you're passing it. And so here's an example. And this is called in-context learning and specifically zero-shot inference with an instruction. And so here's the instruction, which is summarize the following conversation. Here is the actual conversation. And then we are telling the model where it should print the summary, which is going to be after this word summary. Okay, now this seems very, very simple. And let's see how it does. Let's see if things do get better. So not much better here. Okay, so the baseline is still person 1 is in a hurry. Tom tells person 2 there's plenty of time. And then the zero-shot in-context learning with a prompt, it just says the train is about to leave. So again, not the greatest. And then here is the zero-shot for the computer sample. It's still thinking that person 1 is trying to upgrade. So not much better. So let's keep going here. There is a different prompt that we can use here, which is where we just say dialogue colon. Okay, now these are really up to you. This is the prompt engineering side of these large language models, where we're trying to find the best prompt. And in this case, just zero-shot inference. So no fine-tuning of the model, no nothing. All we're doing is just finding different instructions to pass and seeing if the model does any better with slightly different phrases. So let's see how this does. So really this is sort of the inverse of before, where here we're just saying here's the dialogue, and then afterward we're saying what was going on up in that dialogue. And let's see if this does anything better. So Tom is late for the train, so it's picking that up, but still not great. Here we see person 1, you could add a painting program. Person 2, that would be a bonus. So a little bit better. It's not exactly right, but it's getting better. It's at least picking up some of the nuance. Now as part of in-context learning, you learn there's something called one-shot and then few-shot. So let's get a sample of that here. Let's get hands-on with one-shot and then few-shot. So earlier we were doing zero-shot. That means we're not giving it any samples of prompt and then completion. All we're doing is just giving it a prompt. We are asking the model to do something and seeing what the model generates. With one-shot and then few-shot, we will actually give it samples that are correct or that use the human baseline. And that then gives the model a little bit more information to work on. So let's see how one-shot works here. And so all we're doing is just taking a full example, including the summary from the human baseline, then giving it a second example, but without the actual summary. And that's the dialogue that we want the model to summarize. So let's see how this looks. So one-shot means I'm giving it one complete example, including the correct answer as dictated by the human here, the human baseline. Then we give it a second example and ask the model what's going on. So let's see how we do here. So here we're just going to do the upgrade software. Person 1 wants to upgrade. Person 2 wants to add a painting program. Person 1 wants to add a CD-ROM. So I think it's a little better. And let's keep going. There's something called few-shot inference as well. Now some of you might be asking, well, this seems like cheating because we're actually giving it one answer and then asking it. And it's not really cheating. It's more of you're helping the model help itself. Now in future lessons and in future labs, we will actually fine-tune the model where we can go back to the zero-shot inference, which is what you would normally think of as a good language model. But here we're just building up some of the intuition here. And so keep in mind, this is a very, very inexpensive way to try out these models and to even figure out which model should you fine-tune. So we chose Flan T5 because it works across a large number of tasks. But if you have no idea how a model is, if you just get it off of some model hub somewhere, these are the first steps. So prompt engineering, zero-shot, one-shot, few-shot is almost always the first step when you're trying to learn the language model that you've been handed. And dataset, also very dataset-specific as well, and task-specific. So few-shot means that we're giving three full examples, including the human baseline summary. One, two, three, and then a fourth, but without the human summary. Yes, even though we have it, we're just exploring our model right now and we're saying, tell us what that fourth dialogue is, that summary. And just ignore some of these errors. Some of these sequences are a bit larger than the 512 context length of the model. Typically, you would probably want to filter out any of these inputs that are larger than 512, but here it still does a pretty good job. So here we see a case where the few-shot didn't do much better than the one-shot. And so this is something that you want to pay attention to, because in practice, people often try to just keep adding more and more shots, five-shot, six-shot. Typically, in my experience, above five or six shots, so full prompt and then completions, you really don't gain much after that. Either the model can do it or it can't do it, and, you know, going above five or six. Here we see for this particular sample, that really one-shot was good enough. Now the last part of this lab is going to be fun. This is where you can actually play with some of these configuration parameters that you learn during the lessons. Things like the sampling, temperature, you can play with these, try out, and gain your intuition on how these things can impact what's actually generated by the model. In some cases, for example, by raising the temperature up above, you know, towards one or even closer to two, you will get very, very creative type of responses. If you lower it down, I believe 0.1 is the minimum for the HuggingFace implementation anyway, of this GenerationConfig class here that's used when you actually generate. I could pass GenerationConfig right here. If you go down to 0.1, that will actually make the response more conservative and will oftentimes give you the same response over and over. If you go higher, I believe actually 2.0 is the highest. If you try 2.0, that will start to give you some, you know, very wild responses. It's kind of fun. You should try it.