In this lesson you will use AI21 conversational RAG tool and also build your own RAG pipeline. Let's get to it. In the previous lab, you have used the Jamba to process long document using the document parameter and also in a single prompt. But we cannot end our lesson without mentioning of RAG, can't we? Depends on your use case data, you may want to consider using Jamba with a RAG pipeline to balance the response, quality, latency, cost, and other important metrics for your use case. The long context window of the Jamba model can help to improve the performance of a RAG pipeline. Here are a few potential ways to do so. We can include higher number of most relevant retrieved segments. You can use longer segments. We can have long multi-turn channel history included, and you can leverage retrieval strategies, including neighboring segments or full documents. In this lab, you will use Jamba for RAG Pipeline in two different ways. First, you will use an auto box RAG tool built by AI21 called AI21 conversational RAG. Then you will finish off the lesson by building a simple RAG pipeline using the AI21 Jamba model with LangChain. Before you started to use AI21 conversational RAG. Let's go through its architecture and understand how it works. The process starts with the user query, combined with chat history and being sent to an execution engine. The execution engine decides if the organizational data is needed for the query. If not, a query and chat history will be sent to an LLM, which if Jamba in this case, to generate an answer. If additional organizational data is required, the user query will be extracted and decontextualized as a standalone question to be used to retrieve the relevant segments. After retrieval, the query and retrieved text segments will be used to generate a grounded answer using the Jamba model. To further improve the accuracy and quality of the answer, a judge model will be used to validate the generated answer. Regeneration will be triggered if the original answer didn't pass the validation. Finally, the validated answer will be recontextualized with chat history and sent back to the user along with citations. To conclude, AI21 Conversational RAG is an autobox RAG engine that is easy to use. While the parameters for each step are also customizable. For example, the maximum number of retrieved segments, retrieval threshold ,retrieval strategy, and search method. The full conversational history is used to generate the response to enable a multi-turn Q&A experience. Now let's dive into a lab and see everything in code. Like in previous labs, let's use these two lines to ignore unnecessary warnings. First, you will import AI21 libraries just like you did before, and load API key to create AI21 clients. The API key has already been set up for you, so you don't have to worry about it. AI21 Conversational RAG really only asks two things from you the document you want the answer to be based on, and the question you want to ask. The 2024 Nvidia 10-K annual earnings report will be used in this example. To upload a file to conversational RAG. We can import the file upload function from the utils. We have uploaded the file for you already so you can skip this step. In case you did upload your file, please remember to delete the file once you are done using this two line of code. Once the file is shared with conversational RAG, indexing process will be done for you automatically. Now you can use this conv RAG response function to sending your queries. This simple function appends each round of user query and assistant response, and include the entire chat history to provide more context for your query. Here, you will only return the LLM response. You can also get the citations from AI21 conversational RAG. Including the retrieved text segments, and the files containing them. If you want to trim any RAG parameters, such are the maximum number of retrieved segments, retrieval threshold, and strategy, search method, choice of LLM at each step, model temperature, and prompt template, feel free to make changes in the utils files. Now it is time for you to ask questions. To get started, let's say you just want a quick summary for the 10K file, and you can follow up and ask more detailed questions too. If you ask a question that cannot be answered by a document, for example, if you ask about an investment decision, the conversational RAG will not answer it for you. To make it more fun. You can deploy a gradio app in a notebook and start chatting with AI21 conversational. To create the gradio app, we can point the function to a conversational RAG response function we defined earlier and add a few pre-populated examples. Even though conversational RAG won't make a direct investment decision for you, you can still use it to pull very specific financial information to help you make an informed investment decision. For example, you can ask about the actions taken by Nvidia about sustainability and quickly get an answer from it. Now you can have a quick read about the immediate initiative about sustainability. You can learn about the risk, capital allocation, cash flow and other aspect about the company as well. Or you can also add your own questions as well. All right. To wrap up the lesson, you will create a simple RAG pipeline with LangChain using the AI21 Jamba model, where you can customize every step in the pipeline. You will need a few things to build a RAG pipeline with LangChain. You'll need an LLM, which will be the AI21 Jamba model here. Your index, your document which includes text chunking, embedding models and a vector store to store the embeddings. When you ask a question wrapped in a prompt template the most relevant text segments will be used to help answer your question. Let's start with using the Jamba 1.5 large model and the choice of our LLM. Then you can load the Nvidia 10K file. Split the doc into smaller chunks with 2000 tokens each and 400 tokens of overlap. Next step, you will use the embedding model from HuggingFace and the chroma vector store. At this point, the indexing portion is already completed. Here is a quick prompt to template you can use to provide instructions to the Jamba model with retrieved text and your question included. You can customize this prompt template with your own instruction to the Jamba model. To retrieve the most relevant text document, you will use the maximum Marginal Relevance Retriever, MMR and retrieve the top ten most relevant segments. All right, we have arrived. The last step to build your own RAG pipeline with LangChain, you can stitch every components together, including the retriever, prompt, LLM to create your own RAG train. Now your pipeline is ready for you to ask questions. If you want to check all the text segments that has been retrieved in the RAG pipeline, you can also look at the results from retriever for your question. To see all the type segments has been retrieved for you. If you have a list of questions you want to ask, you can send your list of questions in batch to get an answer for all of them. When you finish your lesson, don't forget to clean up the vector store. In this lesson, you have learned to use AI21 Conversational RAG and to build a RAG pipeline with the Jamba model using LangChain. This wraps up our lesson. Throughout this lesson, you have learned three different ways to handle long documents with Jamba. You can include the entire text of these long documents in your prompt. You can attach these documents using the document parameter and you can use different tools, including AI21 Conversational RAG. I encourage you to build your own generative AI applications with your own documents, using different techniques and tools you have learned throughout the course.