In this lesson, you'll learn how you can build a very simple right pipeline and how to customize the behavior. We'll first create a simple QA write pipeline. Then we will also get it to reference the answers. All right let's dive into the code. Let's build and customize a rack. Pipeline retrieval. Augmented generation happens in two main steps. The retrieval step that is getting a query and then retrieving the most relevant documents from a database for that query. And then the generation step. If we take a closer look this is how that looks like. We have a query. We have a database. And we also have an instruction. Often we talk about retrieval augmented generation in the context of question answering. For example instruction or prompt could be given the documents answer the question. We first have the retrieval step where we retrieve the most relevant documents. And then we augment our prompt with the contents of the documents, as well as the question. The final step is the generation step. The generation step uses this augmented prompt to generate an answer. Retrieval can happen in one of many ways. The most common retrieval step happens with embedding, retrieval, or semantic search, where we create embeddings for the query and then compare that embedding to all of the embeddings of our documents in a database, and retrieve the most relevant ones based on semantic similarity in haystack. That pipeline looks a bit like this. We have an embedding for the query. We then have an embedding retriever. We then have a prompt builder, which we'll see in the lab. And then finally we have a generator. retrieval step could however happen with keyword based retrieval like bm25 as well. Sometimes you'll see that there's retrieval followed by reranking and the last thing I want to mention is often in the context of retrieval augmented generation retrieval is actually happening with a completely separate API. For example, we might retrieve data from the web or slack or emails, etc.. We'll have a look at this in one of the following labs. In this lab, we'll be building two types of pipelines and indexing pipeline and a rank pipeline. The indexing pipeline usually cleans and splits and then embeds documents into a database. But for this lab, we're going to be using the web. So We're going to be fetching contents of URLs converting them and embedding them. going to be using short websites. So we're actually going to be skipping the splitting step. finally we're going to be building a pipeline. Let's see all of this in code. right. Let's start by building and customizing a rack pipeline. First, let's start with suppressing warnings and loading our environment variables again. And next will import all the dependencies we're going to be using in this lab. We're going to be using different embeds. So there's a lot going on here. There's a lot of imports here because we're going to be building different types of pipelines. And one thing we're going to be changing is this time for embeddings. We're going to be using the cohere document embed and text embed. Some Hastag components come as integrations. So for this case we're going to be importing from Haystack Integrations. And we're going to be importing embeddings for cohere. Let's start by indexing documents. So this lab was changing the way we're indexing documents. Previously you had indexed txt files into your in-memory documents store. This time we're going to be using a component called the Link Content Fetcher. We're also going to be cheating a bit because we're going to be using URLs that don't have a lot of data in it. So you can skip the document splitting step here, but if you'd like you can also add it. Let's start by initializing our document store. Again we're using the in-memory document store. and then let's initialize all of the components. We're going to be using. First we're going to be using the link content fetcher that is able to fetch the contents of URLs. Well then using a special converter HTML to document that's able to convert the contents of these URLs into a format that haystack can understand. So it's going to be converting them into haystack documents. Next let's use the cohere document embed. For this I've picked the Embed English v3 model. You can also change this if you like. Finally, we're using our trusty document writer. Next, you'll be adding these components to your pipeline and giving these components names. And finally we're going to be connecting these components. Let's start by connecting the fetcher to the converter. Fetcher output streams and converter is expecting sources. So being very clear about that. Finally let's also connect the other components. And now we have a full indexing pipeline that's able to get the contents of URLs. Embed them with a cohere embedding model and write them into a in-memory document. Store. before we run our indexing pipeline, we can also observe what the pipeline looks like. We can see that the first component in our pipeline is fetcher and it's expecting URLs. So we know we need to provide fetcher with URLs to run this pipeline. Now we can run our indexing pipeline for this I've chosen four URLs from the haystack website. All of these are quite short pages. You can change this, but make sure that you're adding a document splitter if the contents are quite long. Here we have full pages from haystack integrations like anthropic, cohere, Gina and Nvidia. You'll notice that you're calculating the embeddings, and we also get the notification that full documents have been written into our document store. Again, you can also inspect the contents of your document store. We can have a look at the document at index zero. For example. A few things I'd like to highlight here is that this document also comes with metadata. use the link Content fetcher. So we have the contents of those pages as a document contents. However we also have a URL in the metadata of this document object. We'll use this URL later on as well. Now that you've populated your in-memory document store with the contents of various URLs, we can start building a retrieval augmented generation pipeline. The way I like to do this is starting by deciding on a prompt. Haystack uses jinja for prompt templating. Jinja is a templating language and is quite useful because it allows us to do many things with our prompts and modify our prompts in many ways as well. For example, here you'll notice a for loop. We use these templates with a component called the prompt builder. And when we have this type of prompts well essentially saying that documents and query are expected inputs to our prompt builder, while also able to loop through these documents and specifically add the content of those documents into the prompt. Jinja allows for different things like if statements for loops, and so on too. It comes with a bunch of default functions like truncation, lowercase, etc. as well. It's a pretty nice tool to be able to use. Now that we've decided on our first prompt, which is answer the question based on the provided context, we can start building our pipeline. Let's start by the query embed. We use cohere to embed the documents. So we're going to make sure we use the same cohere model for the query embedding. Next you'll be using the in-memory embedding retrieval. after that we're also going to be initializing the prompt builder. And this prompt builder is basically going to be using the prompt above as its template. Finally we need the generator. For this I'll be using the OpenAI generator with the default model which is GPT three. But if you like, you can also switch this up. For example, you can decide to use together AI and serve a open source model with together AI, for example, will then use this model with the OpenAI generator instead. But I'll be using GPT three. This is completely up to you. Don't forget to export the together AI API key if you do want to use it. Let's start initializing our pipeline. I'll call it rag. And again we're adding all of the components that we need to this pipeline. Finally, let's connect all of the components to. And there we have it. Now we can run our retrieval augmented generation pipeline. let's go ahead and visualize this pipeline as well. All right. So again we start with a query embedding which is expecting text as input. And we're going all the way down. And the generator is producing a response And it outputs these responses and replies. Now that we know this let's run our pipeline. Let's start with the question how can I use cohere with haystack? For example, we already know that we embedded a URL that talks about the cohere integration. You'll have noticed that it wasn't only the query embedded that's expecting the question, but our prompt is also expecting the question. I'll also modify the retrieval to retrieve only one document. you can play around with this as well and change up the number of topics and so on. To finally, let's print out the results from the generator. You saw that the generator was outputting something called replies. So let's just simply print that out. And there we have it. We now have a response from OpenAI, GPT three, about how we can use cohere with haystack. Next, we can start looking into how we may customize the behavior of these right pipelines. In the use case above, you saw a very simple question answering prompt. You'll also remember when we indexed our documents into the in-memory document store, we highlight that the URLs where in the metadata. This was because we use the link content fetcher. Maybe we can start customizing our prompt to behave in a slightly different way. So for this case, I'll start with a prompt that not only says that you'll be provided with context, followed by the URL that the context comes from, but also that it should be answered in a specific language. So let's start writing this prompt. We already say your answer should be in language. So now our prompt is expecting language as input as well. Next, let's start changing how we add the content of our documents. I'll create the same for loop and once again adding the content of the documents. But we know that meta is in our document object and it has URL. can also add URL for every document that we're adding to our prompt. Finally, let's end our prompt with the question and ask for an answer. Now that we have this customized prompts, let's start creating the same exact pipeline as before. The only thing that's different now is that our prompt builder is using our new template. Now we've created our pipeline. Let's run this same pipeline. But make sure that we're providing all of the extra input that our model is expecting as well. I'll run this pipeline with the same exact question, but you can change this up. I'll also ask for the language to be French, so I'm telling the prompt that there should be an input language and I'm providing it to French. Let's see how well OpenAI is at answering questions in French instead of the language of the document. And we should also notice that it references a URL from which the answer is generated, because our instruction has a as well. And there we are. think this is correct French. And you'll also notice that the URL is referenced to. This is because we were able to add the URL specifically into our prompt before asking for a response. You can try playing around with the inputs, maybe even change what the input the prompt is expecting to ask for different things. Play around with the top K of the retriever, and also try asking different questions to In this lesson you learned how to combine haystack components to create react pipelines, but you also learned how you might customize the behavior of these react pipelines by simply modifying the prompt. And the next lesson, we're going to be looking at how you can create your very own haystack components as well. See you there.