In order to use private data with LLMs, one of the best tools you can use is federated LLM fine-tuning. In this lesson, you will first learn how this approach works, how various challenges like privacy and efficiency can be overcome. Before you then use it yourself to build your very own federated LLM. Okay, let's build something. Before we get to the code, let's make sure all the main concepts of Federated LLM Fine-Tuning are clear first. Let's start with the basics, and the most important ingredient in all of this is federated learning. The big idea, as mentioned before, is the training process is going to move to the data. And that's what you see being illustrated here by this animation. In this lesson, your understanding is going to transition from what you can see in this animation. That's hides a lot of detail to a level of understanding where you can appreciate all the core techniques that are working together to make this learning actually function. But before we get there, let's again stop and compare with the approach that you just used in the prior lesson. Recall you used the centralized fine-tuning approach which required the data to be copied to a server before fine-tuning could begin. And again, you see this happening here in this animation where each of the individual hospitals, we have the data having to be moved to the server before fine-tuning can occur. And then this is the fundamental difference between the two approaches that we're looking at in this course. One final mention that this course is focused on LLMs, private data and how this relates to federated fine-tuning. But do look at the other introduction course on federated learning for all the other details about federated learning more broadly. Very same in the code that you'll be looking at, we're going to be using what is a basic recipe for performing Federated LLM fine-tuning. You can think of it kind of like a jigsaw puzzle where there are three main pieces being used. We're using federated learning, parameter efficient fine tuning, and differential privacy together to form a recipe that's going to allow us to perform this particular variety of fine-tuning and have all those benefits that we've discussed up until this point. I could replace the federated learning piece that you're going to be seeing with, for example, vertical federated learning. I could replace the fine-tuning that you see with something that perhaps is doing something related to prompt tuning. We could look at different variations of differential privacy. So, there are many different ways to perform this action. And we're looking at a very basic formulation. And the other thing that you need to remember is that we can put in additional pieces into this puzzle, depending on the needs of your application and data. And so you could add into this mix of these three puzzle pieces, a piece that would support hierarchical federated learning or perform some type of model compression or include homomorphic encryption. And you'll use these variations in additions to meet the needs of privacy or accuracy or the particular architecture that you're using. But right now we're going to start by just explaining this basic form of this, LLM fine-tuning technique and move through it one by one. To build some depth of understanding of this process, let's start at the most basic form of federated LLM fine-tuning. One that initially doesn't even use differential privacy or PEFT, parameter efficient fine-tuning. They'll be introduced bit by bit. And in this way you all should be able to see why they are needed. But let's start here. And you can see that this illustration is actually using our scenario that we've described in the prior lesson, where we are using data, private data from a number of different hospitals. And you can see them being illustrated here at the bottom of the diagram. And the code you'll use shortly, we use in that notebook there Mistral 7 billion parameter base model. And that's why I've now added it to this picture. You can see it there adjacent to the global model sitting right above the server. This is the LLM that you will improve by leveraging the private data in the clients. The first step towards performing fine-tuning under this regime is to do client selection. You can see this also has occurred in this diagram. And two clients have been selected they're being indicated right now with these two blue arrows. Now depending on the situation you may have designs where you have full client participation, where all the clients are being active within the training round at once. But being able to select a subset of the clients allows us to manage issues like communication overhead. In this, situation, this example, these two clients will be performing standard training on the model they have available locally to the model architecture, which happens to be here this Mistral model. What happens is that each client training will take place in the model weights of this LLM are updated again solely based on the data that the client has. If we move forward now, the next step is that the individual and data model weights from each client are then sent to the server where an aggregation phase needs to take place. The updates coming from the clients are aggregated. And for example, we would use something like fed average, which is a scheme where all of these model updates are directly averaged together. And then this update is applied to the existing model. This is the Mistral model, in order to fine-tune it to account for the data that was present at the local clients. Each averaging step is called a round, and multiple of these federated learning rounds, and necessary to be completed until the fine-tuning is finished. So, now we've described a complete method for doing federated fine-tuning of an LLM. So what's next? Well, you might have noticed that there potentially is some communication overhead. The reason for this is that the LLMs are much larger than the models that are typically used by federated learning. And so when we talk about transmitting model updates, we're actually often talking about transmitting the whole model between client to server. Now this might be fine when you have relatively small architectures, but obviously in the realm of LLMs this can quickly get out of hand. And so as a frame of reference, the Mistral model, there is this 7 billion parameter model, itself, it's likely to be on the order of around 27GB, depending on what precision use for the parameters. Now, if we're actually transmitting 27GB between multiple clients to the server each time around is occurring, and we're doing this over multiple rounds, then the amount of data that's being sent and even the time it takes to transmit that data can start to become unacceptably slow. And this isn't something that you would typically run into some intensely in non-LLM settings. So, what do we do? This is where we introduce PEFT or parameter efficient fine-tuning. PEFT, is going to lower the number of parameters that have to be sent from client to server each round. Let's look at how this is done. In this slide here, what I've done is I've shown a fragment of the typical overall topology slide on the bottom left of the slide so we can keep our frame of reference. So you can see clients exchanging, model updates with the server. But on the right here I'm illustrating the concepts that enable PEFT to vastly lower the amount of data that needs to be transmitted between client and server. And so the key idea in PEFT, and we happen to be using LoRA as our form of PEFT and the code that you will be using shortly, the key idea is that we're actually able to freeze much of the weights of the model. And you can see this, illustrated in the diagram, at the client level, where suddenly we've colored a lot of the model from being this yellow, color to a lot of it being blue. And that's to account for the fact that during the model update process that happens locally through the PEFT technique, those, weights are frozen and not actually being updated during a model update. Now, all we have to do is not communicate these frozen weights, but only communicate those weights that have been allowed to change. What's interesting is, because this technique seems like a fairly conceptually easy technique to work here, the key, the key factor here is that this technique has been shown to work. Even when you dial up the percentage of parameters that are frozen to extremely high levels. And so you see here I've denoted as an example, 98% of the weights are acting in this process. That's frozen with only 2% of the weights, able to be changed. Now, interestingly, even at that level, we can still have extremely good levels of fine-tuning occur, and the underlying model is really able to adapt and extract a lot of the information we need, even though most of these weights are not changing. But through this process, we now have a facility that, the model itself can update but the amount of data needs to be exchanged is vastly reduced. And when we go through the code, you'll see action numbers about what this means and how this improves performance to such a large degree. But there are two big factors here that we've just addressed and two big challenges. One, now this bandwidth issue that becomes quite acute in the case of LLMs has been managed nicely. And is also second order effects, such as the amount of computation that is required is also reduced because we're not having to modify so many parameters. What other type of challenges might we have? Well, one of the focal areas of this whole course has been on privacy. And we know that LLMs are able, under certain circumstances, to regurgitate training data. And so even though one of the core building blocks of federated learning is that data no longer needs to be transmitted, that in itself is not enough. And so we're going to add in a final piece to this puzzle. And that is the piece of differential privacy. You can see it illustrated alongside PEFT in our diagram. Let's dive in to what differential privacy is all about in this context. So, again I would say for a more detailed explanation please see our companion course on Introduction to Federated Learning that looks at all the variations and in other companion techniques, achieve some of the outcomes. But we're just going to sketch out how differential privacy is being used here for you. Again look on the left hand side, I've got this nice frame of reference diagram here that reminds you of how these clients are talking to the server. And on the right, we're drilling down just onto some specific changes that have occurred through the introduction of differential privacy in the setting. And so essentially differential privacy introduces a number of techniques that you can use to mask some of the specific details of the individual training examples in order to lower the opportunity for them to be leaked. And so what you see on the right hand side of the slide is an illustration whereby at the client level, we are adding deliberately calibrated noise during training. And so this red line that you see, conceptualizes the notion that the signal that is being obtained through the learning of individual examples, that end up reducing model error, is being added by noise deliberately and where that noise is being selected and applied, so that the signal can be carefully obscured in an important way. The model updates that occur when we're adding this noise are then shared with the server, and these noise model updates is how we can enhance the level of privacy protection for training data during this process. The resolution of protection that makes sense in this particular application is at the level of individual training examples, but depending on different applications you might have you might have different, subjects that you're trying to protect. And so instead of worrying about individual training examples, you might be working at the granularity of authors who are generating a large number of samples and trying to mask that as a group. From a slightly more technical perspective, we can start to look at what differential privacy is trying to achieve for us. This slide here shows two distinct models, LLM A and LLM B. The only difference between these models is that one was trained with a particular training instance, and one was trained without that incident being present. What differential privacy seeks out to achieve? Through its addition of noise, is that these two models should produce, results that are indistinguishable from each other, even though, in one, the training example is there and then the other, the training example is not. And in this way it provides a form of plausible deniability and difficulty in accessing the underlying training data that can help protect against issues such as the model regurgitating aspects of the training data that it was exposed to. So differential privacy helps us in be able to enhance a level of protection for training data and it works alongside more fundamental core opportunities to provide a more private and secure system, such as the fact that data it does not have to be copied anymore. Now, an important factor to add into this discussion is to provide you with a bit of intuition as to how differential privacy works and its relationship with adding noise, and yet still allowing us to learn something. And so I want to provide you with this illustration here to give you this bit of intuition. And so you see this image, it was taken actually a really fun event called the Flower AI summit. It's turned out to be the largest federated learning conference in the world. You can see folks here enjoying themselves at this event. The top image in this again, this is providing us with a conceptual understanding on how differential privacy is working. The top image is untouched. we can see in this data potentially privacy-sensitive information. We see people's faces. We see text in the background. We can see a lot of different information that might not be fundamentally needed to perform the learning we want. And then below it we see the same image again. But we've strategically blurred added noise to certain regions that are very sensitive. And so the regions of people's faces, they are masked the words in the background, they're masked. But what you observe is that it's still possible to understand and learn from the image that has the selective noise placed upon it, macro patterns. For example, you can learn how many people were in the room. You can see the type of room. You can see the type of event and what we've done through this additional noise, though, is made it more difficult to recognize certain characteristics such as the face. And this gives you hopefully a strong intuition as to how this mechanism is actually operating and working for us in protecting individual training examples and yet still allowing us to learn these macro patterns that are so important for the model to succeed. So now we've reached our end to describing all the individual pieces that are working together to enable federated LLM fine-tuning to function. It's now time to transition to the code. And just as a brief reminder, we'll be entering back into our favorite medical scenario where we want to learn from a number of different private data sources in order to improve a generic backbone of an LLM, so that it can then answer highly specific medical questions and a much higher fidelity. Let's dive into the code. In this third lesson, your main aim here will be to learn how to perform federated fine-tuning on another LLM. As we saw in the previous notebook in the prior lesson, a lot of the time we are going to be using a smaller LLM of 70 million parameters, but we will also be providing you with everything you need to fine-tune a larger 7 billion parameter model. Recall also in the prior notebook, you played the role of a scientists in a hospital. So this data scientist, could only make use of data that was available in the hospital. But now, through the use of federated learning, we are going to allow this data scientist to capture and leverage data from a number of different hospitals around them. And through this, allow them to break through this 10% of the total data barrier. And we're going to see what happens is you can allow the model to be exposed to a larger and larger amounts of data. In this particular notebook, you will be making use of the Flower simulation engine to simulate a real federated learning system of 20 clients. And each of these will be representing a different hospital. Let's begin by importing again a few of the packages and utility functions that we're going to need. This notebook also makes use of the Hugging Face transformer library in PEFT. And let me highlight some of the inputs, that are of particular relevance for this, notebook. First off, you can see that we are importing Flower. Flower is going to be used, as I mentioned, to run the Flower simulation engine. We're also going to be making use of Flower datasets. This is a library that allows us to partition the Med Alpaca dataset into 20 disjoint separate datasets, which correspond to each of the 20 different hospitals in this, piece of code that you'll be running. This example also makes use of differential privacy. So we need to import a mod that enables the client, in this case the Flower client, to use differential privacy as well as act as a wrapper for a Flower strategy to apply differential privacy after aggregation. And so you can see here as importing the Flower strategy and the specific differential privacy mod that will apply to the to the Flower client. Let's next begin by loading the configuration. As you can see, its content is very similar for the one we used in less than two that focused on the approach of centralized fine-tuning. So you can see this config uses the Med Alpaca dataset. As we've mentioned a few times when we run this in the notebook, we're using the smaller version of this one that happens to be 70 million parameters. You can see this here. And then you can see a number of the other more generic hyperparameters. You can see, very similar values being specified here. Just as we would under a centralized training, setting. If I scroll to the lower part of this configuration, though, we see there's a few new, entries under this, Flower tag. So in this configuration, we can see that it specifies the number of clients. It also specifies the number of federated learning rounds, that are going to be performed. You should note that, here we specify two. So rather low number. But we've done this so that it quickly, completes within the notebook. Furthermore, you can see that we are involving 20% of every client during the round. So this is during this client selection phase of a learning round, we are only going to be interacting with 20% of the clients. Additionally, you can see under the client resources tag of this configuration, it's possible for you to adjust the degree of parallelism being used within the simulation. And so these values represent the resources a client is allocated. Intuitively, if you lower them, more clients will be able to run on the same hardware. For the 70 million parameter LLM setting, we're using two cores per client, and this is sufficient for this demonstration. However, if you happen to use this notebook later on a GPU, you may want to increase it. Finally, the DP section of this configuration specifies num clipping and noise multiplier parameters. And each of these, define the degree of protection that will be afforded to the training data during the fine-tuning process. That lowers the opportunity for data leakage to occur. Let's now begin by partitioning the Med Alpaca dataset into 20 disjoint sets. Each of these disjoint sets will become the local data set that corresponds to the data in the scenario that belongs to one of these 20 hospitals in our federation that we're about to simulate. Recall here, that a client corresponds to a hospital. To do this, you're going to use Flower datasets. So this is a toolkit specifically designed for federated datasets to assist with, downloading, processing and in this case, partitioning datasets, correctly. Interestingly, dataset partitioning can get fairly complicated, especially when you start to introduce different forms of data heterogeneity into the mix, which is common in more realistic forms of federated learning settings. With Flower datasets, which comes with several built in partitioning schemes, you can forget all about this and put all your attention into the design of your architecture and your algorithm, and tracking those critical metrics indicate performance. Here, we are going to take the simplest approach and split the data, using the iid partitioner, meaning that all partitions will be constructed by uniformly sampling from the whole dataset. Let's run the code and inspect the the metadata of the first partition. Just as we saw in the previous notebook, the overall datasets contains two columns. and so on this partition, the zeroth partition of the dataset. It also includes two columns instruction and response. Let's now do a simple visualization of each of the data partitions, and which will then display the amount of training data that each one has. As you can see, all 20 partitions, have roughly 1700 training examples. Y axis is the number of examples present in each of these, partitioned data silos. And you can see that there are 20 of them where the x axis has the identifier for each partition. Now the dataset is ready. Let's proceed with the rest of the components that you will need to run your federation. Just like we saw in the previous notebook, let's load the tokenizer and other components that are needed to preprocess the input to the LLM. Clients in Flower are defined using a ClientApp. The ClientApp can be constructed by specifying a client function that returns a client object which knows how to do two things: instantiate the model, and then run the training loop. Because those are the necessary things each client will be required to do during a federated round. The client essentially runs those rounds locally, just as the fine-tuning function you wrote in the first notebook. Finally, because this client is going to make use of differential privacy, you'll see here in the client specification, a differential privacy mod, A DP mod is passed if you need to know more about this particular mod, you may want to look at the, Flower documentation available at Flower.AI. You can see here at the bottom, the mod that's being specified is the fixed clipping mod, which provides us with a specific type of differential privacy. With the ClientApp defined, Let's turn our attention to the server side. At the core of a server there is a strategy. A Flower strategy is in charge of sampling clients, communicating instructions to the clients, receiving model updates from the clients, running the aggregation of models, and a few other smaller bookkeeping items. In this notebook, you will make use of Fed Average, which was, as previously mentioned, relatively simple but surprisingly effective method of aggregating model updates at the server level. In this lesson, you will enhance the FL with differential privacy. In order to do that, you will use a wrapping strategy that will maintain the behavior of the Fed average strategy you just created, but will also add the necessary extra functionality required for differential privacy. This extra functionality means, in this case, adding calibrated noise to the result of the aggregated model. Now that the strategy is ready and includes differential privacy, you can instantiate a server app by also indicating the number of rounds that you wish to run the simulation for. By default, the simulation will run for two rounds, but you are welcome to increase this number. Let's remind ourselves where we are. We have the dataset, the client app and the server app all ready. You are now ready to launch the simulation. To do this, you just need to run the simulation function. And these notebooks running the simulation, should take a couple of minutes. Don't forget, you're actually going to be fine-tuning with a small but still 70 million parameter a LLM for a couple of rounds, and each round is going to involve four clients. So if you think about it for a couple of minutes, this is not too bad to run the simulation. Now we're ready to run the simulation. Let's see how it goes. As the simulation runs, you'll note that it generates a few information logs that divide it into four sections. We're going to need to wait for a few minutes until it completes. And then we can look at this log data and then, see what it happens to be telling us in terms of how well the simulation went and what went on. Here, we see the finished simulation, and we can look at all of the log information. You can see here it's broken down into four sections. The first section in it is logging information about the initialization process of the simulation. Then we see two sections related to the two rounds of training. The two federated learning rounds. And then finally at the bottom you see, the summary section where there is a summary of the simulation, process. I think one of the most interesting lines here is the last, as it shows the average training loss at each client. Note that in this example, there is no centralized evaluation happening, which is why you see the loss. being zero for the, centralized evaluation part. Now that you have done federated learning fine-tuning of a 70 million parameter LLM, let's see how you can run the fine tuning of a model that is 100x larger, i.e. this Mistral 7B LLM. Now it's time to see the improvement. Or to judge just what was the outcome of performing this fine-tuning? Recall here, even though we did fine-tuning on the smaller 70 million parameter model, we are now going to transition to the larger 7 billion parameter model to just see the improvement, hopefully, of exposing it to all that interesting new private data. What we've done is offline, performed a full fine tuning of the larger 7 billion parameter model. We use the exact same code that you just looked at. But we've done this fine-tuning on the data and saved a model checkpoint. We're now going to load that model checkpoint into the system and use the fireworks AI API to perform inference over this model. So that we can get a sense of just how much better it is at responding to medical domain questions. Now that we've done the fine-tuning in a federated approach on this private medical data. So, the first thing we've done is loading this prior model checkpoint. And now, just in the case of the prior lesson, I mentioned, I'm not a medical doctor, so I'm not the right person to come up with a question to challenge this model. And so we're going to go back to the Med Alpaca data, select a training example, and use that as a question to ask the model in and see just how much better it can do. The example that we are using here is, number six. You should feel free to change this number and experiment for yourself about how different questions will result in different types of responses from our fine-tuned model. But with this particular question number six, we propose to the fine-tuned model. "What are the possible causes of low glucose and high C peptide levels?" And the response from our model is that it's talking about how these types of factors can be caused by one of the factors it talks about is this condition of insulinoma. And it mentions that this is a rare tumor. And if we look at the expected output, that's the output that comes from the training set. We can also see that this gold standard response is also pointing to the same insulinoma issue. And so this is a fairly good response, from comparing the response of this fine-tuned-model on the private data and then also comparing the response towards the ground truth. The model's done pretty well here, I think. The important factor here also is that unlike in the prior lesson, where we also achieved a boost in the ability of the model to answer medical questions, we did that, remember, by doing centralized fine-tuning, which meant we are actually copying that data to the server. For that possible here, we achieved similar outcomes and nice increase in the qualitative improvement and the qualitative behavior of the model. But we've achieved this with federated learning. We've achieved it with differential privacy. And we've also used PEFT to keep it nice and efficient. If you recall, in the prior lesson, we felt it was, insufficient to just show a couple of examples of a question then, looking at the response, seeing an improvement in the model and indicating that there's a systematic overall improvement in its ability to answer medical questions. And so really, the only way to do this is to do a, a wide-ranging systematic analysis of a large number of different questions. Before, we were comfortable in saying that centralized fine tuned model back in the prior lesson indeed was better. And so we also want to obviously do the same type of analysis under the federated learning sitting here. And so this is what we've done. It takes a long time for all that analysis to perform it involves inference over a lot of different types of questions. And so we've done this process offline. We've performed a systematic analysis, save those results to a variable to a structure called results. And now what we're going to do for you is visualize the outcome of this analysis. If we turn our attention to this figure that's being presented on the results, we have performed offline, we can start to look at what it tells us. In the y axis we are reporting validation accuracy, just as we did in the prior lesson. It includes the two data points that we saw in the prior lesson with the first two bars that you can see where you can see the pre-trained 7 billion parameter Mistral model when it's being asked a series of medical questions performs quite poorly, just over 30%. We then looked at the case where we're performing centralized fine-tuning. We restricted ourselves to 10% of the available data. Under this scenario that you're a data scientist working at a single hospital, and that's all the data you have available to you. Finally, in green, we can see the result that we have achieved in this lesson. What we see is that by exposing under the federated regime, the same Mistral Model to a lot more data, that's private running in these different hospitals is 20 different hospitals. You can see that the model improves an accuracy even further and starts to reach levels that it can be truly be useful. You can imagine that if we continue to add more and more hospitals to our federated network, it's likely that the accuracy will also continue to improve. Finally, let's look at one more variation on this analysis. Again, this analysis is being done offline because it's quite time-consuming. This time we've repeated the exact same analysis, evaluation that we've been describing. It to this point, but we felt it was also important to show what would happen if you provided the same amount of data to the centralized fine-tuning method that we saw in the prior lesson, as we have used in this lesson that we trained using the Federated LLM fine-tuning approach. The interesting thing is that we want to see whether the model accuracy of the federated and centralized variety are going to be similar, or there's going to be some discrepancy. So let's take a look. In this new result, we can see, everything remains the same, except we have the inclusion of an additional bar. In this case, the bar happens to be green, and this is the result that is achieved when the centralized method from the prior lesson is exposed the same amount of private data as we exposed our federated approach to in this lesson, what you can observe is that the accuracies are very similar. And so to an approximation, you can say these two approaches for this particular setting, this particular dataset have achieved the same level of accuracy. Remember, however, that in many cases in the real world, the type of data that we're benefiting from is not being currently used in LLMs because of a wide range of different barriers that are not able to be met through a centralized fine-tuning approach, for example, regulation and issues to do with not being able to copy sensitive medical data outside of a institution or hospital. Before concluding this notebook, let's also examine the communications cost associated with the smaller model, the 70 million parameter and a LLM they were actually able to work with inside this notebook. We have provided for you to use a function called, compute communication costs. In so, what this does is, it uses the configuration of a federated system on the Flower and then provides us with an estimate of a number of important communication and system factors. So let's run this and see what we get. As you can see, thanks to the solution that we include within our federated LLM fine-tuning approach, the communication costs have been reduced by over 300 times. If you were able to communicate the whole model, the fine-tuning that you just did in this notebook alone would have communicated over four gigabytes of data. But because you used PEFT under the hood, the client and server only needed to communicate a mere 12MB of data. Very impressive. Below, we can also see the output of the full scale, 7 billion parameter configuration and what the numbers would be if we're actually performing the federated simulation on the model of this size. So let's do that. Let's now see what that exact same function produces when we run it over data that describes the configuration for a 7 billion parameter model. And so this is the exact same output that we just saw in the notebook. But this time it's accounting for the scenario of running the Mistral 7 billion parameter model. What you can see is actually the gains in communication savings and so forth are even larger. So we can see there is a thousand fold reduction in using PEFT within our particular recipe. We can also see that the amount of data that's being exchanged is a reasonable amount. And so, two types of analysis jumped straight out to you. One, is that you can see the communication savings through using PEFT at the scale of the 7 billion parameter model, is that at over a thousand fold. So you're really significantly saving a lot of comns. And why is that? That's because we're able to perform this fine-tuning by only modifying a really tiny fraction of the number of parameters of the whole model. What this means is that, for a model update, because we're attaching only a small number of parameters and most of the parameters are frozen, and the a few are actually still available to be modified, a single update of even this enormous 7 billion parameter model only requires us to transmit 26MB. And you can see there that we're assuming 20mbps link. And we have an upload that takes just 10s. This type of overhead is affordable, under federated learning, and that coms in the time that takes to communicate these updates is not an overwhelming factor. You have now reached the end of lesson three. Let's review some of the key things that you learned during this lesson. The first. The recipe that we used in this lesson, includes three primary components. The first, federated learning, second parameter efficient fine-tuning, and the third differential privacy. And so as you learned, it is through the cooperation of each of these pieces of the puzzle, we're actually able to provide an approach to doing federated learning that works on LLMs and can do things such as allowing and LLM fine tuning process to leverage private data, while still respecting the privacy of the data, doing so without exorbitant overhead and allowing us to use federated learning, which means the data doesn't have to be transmitted. As we apply these concepts in the code, you saw that the resulting LLM when fine-tuned, and this approach did indeed dramatically improve through the exposure to this private data that in the real world would otherwise be not available to it without the use of this method. And so we saw the LLM was able to respond to specific, highly technical, medical questions much more appropriately then when it was not fine-tuned. Furthermore, you saw that, and this sitting, into this application dataset in architecture, this federated approach to fine-tuning was able to match the performance of the centralized approach that we examined in the prior lesson.