In this lesson you will learn how to use Llama 4 to work with long contexts and directly ask questions about large text files and code repositories. Let's have some fun! One of Llama 4's most powerful features is its long context support, especially in Scout model, which handles up to 10 million tokens. This means you can process entire documents or datasets without having to chunk them into small pieces. You can now use models with long contexts for many of the use cases that needed RAG to overcome the context window limitation of the models. However, note that the long contexts can be more expensive to run and slower to respond, so there is always a trade-off. We will begin with loading our API keys and libraries. Besides the Llama API key, Llama base URL and together API key, in this lesson, we have HuggingFace access token and GitHub access token. HuggingFace access token will be needed for getting the tokenizer for the Llama 4 model, and GitHub access token will be used in one of the long context use cases you will do in this lab. As mentioned before, all the keys are already set up for you on the platform, so you don't need any key to run the labs. You will also import Llama 4 and llama4_together function that you have already defined in previous lessons. Since Llama API in the current preview mode only supports tokens up to 128k, in most of the use cases in this lesson, we will use Llama 4 Together to have full use of the long context size of Llama 4 models. Using the model ID for Llama 4 Scout you can use auto tokenizer to get the tokenizer for this model. Let's run a quick example. For this prompt, this is a test prompt, the number of tokens is seven. Let's see our first long context example. Here we have the book "War and Peace" as a text file. We read the file, show the number of characters and the number of tokens. The text has about 3 million characters and close to 800,000 tokens. If you pass this book to a model, it will be a pretty large context. If you pass this book to a model, it will be a pretty large context. You can now pass this whole book here to Llama and ask it to summarize it for you. And here is a summary of the book that had about 800,000 tokens. Summarizing this book wasn't possible using Llama 3.2 model, because this was beyond the context size of the model. Let's see another use case where you can send multiple files to Llama 4 in a single summary request. We have five archive papers that are downloaded and saved with these file names. Using PDF to text function that is available in utils, we can run a loop on these papers, convert each of them to text, append them together, and finally print the paper file name, the number of characters, and the number of tokens for each paper. And then for the total text, we show the number of characters and the number of tokens. Here are the five papers with the number of characters for each paper and the number of tokens. And finally, for the total of those five papers, we have about 600,000 characters and close to 170,000 tokens. You can pass the total text to llama4_together function and ask it to summarize the five papers for you. And here is the response with the title for each paper and the summary. Let's work on another use case. In this use case, you will convert Meta's Llama models repo to a large text file and send the whole file as a long context to Llama 4 and ask questions about it. For this, we have several helper functions defined in the utils file. Here's the repo on GitHub. We get the repo name, which will be Llama models, and from the extract directory which will be a relative address in the current directory. We pass the repo URL and extract directory to the download and extract repo function to download the repo and save it in local folder. Then get the list of Python files. In the repo, create the output file, address, and name. And finally convert all the py files to text and save it in the output file. After completion, the output returned two files /llama-models _file.txt Let's take a look at the first 100 lines of the file. Here are the first 100 lines of this file. You can now read the content of the generated file and pass it to llama4_together with this question: Which file in the content below has the function _encode_image defined? And here is the response: The function _encode_image is defined in this file. And here is the content of the file with the function _encode_image in it. Let us see another use case. In this use case, you will use Llama to ask questions about all the pull requests. With about 260,000 tokens in a popular Llama repo, the Llama cookbook repo. We have a function get_pull_requests in the utils. This function gets the repo owner meta-llama and repo name, which is llama-cookbook, and returns all the pull requests. Let's run a loop and print just the number and the title of each pull request. Here is the list of all the pull requests with the number and the title for each of them. The list is very long, containing many pull requests. The total number of pull requests are 510. Let's see the content of the first pull requests. The URL for this pull requests its number, and also the title, as well as some other information is shown here. As you see, this is a very long piece of content for just a single pull request. We have a get_ pr_content function in the utils. You can use this function to get the content for each pull request, and append all of them together. Now that we have the content for all the pull requests, let's join them in a single all PR content and print its number of characters and also the number of tokens. There are over a million characters and close to 270,000 tokens in all PR content. Now let's see how many PRs are about Android or iOS. You can pass this query and all PR content to llama4_together and show the result. Here is the final response with all the steps that are taken and the list of PRs. In this case, three PRs are about Android or iOS, and also the final answer which is three. Let's ask another question. How many PRs are about agents? Give a one sentence summary of each. Then a summary of all those agent PRs. You can pass the query and all PR content to llama4_together. And here are the six PRs that are about agents with information about each PR and summary for each of them. And finally, the summary of all six of them is also given. In this lesson, you have used Llama 4 on several long context use cases. In the next lesson, you're going to use Llama's prompt optimization tool to automatically optimize your prompts. See you there!

Please sign in to view this content

Learn Code

Next Lesson

Building with Llama 4

Introduction
Video
・
3 mins

Overview of Llama 4
Video
・
6 mins

Quickstart with Llama 4 and API
Video with Code Example
・
6 mins

Image Grounding
Video with Code Example
・
9 mins

Llama 4 Prompt Format
Video with Code Example
・
8 mins

Long-Context Understanding
Video with Code Example
・
7 mins

Prompt Optimization Tool
Video with Code Example
・
10 mins

Synthetic Data kit
Video with Code Example
・
7 mins

Conclusion
Video
・
1 min

Appendix - Tips, Help, and Download
Code Example
・
10 mins

Course Feedback

Community