So now that you've added some evaluations to your router and skills for your agent, the next step is to evaluate the path or understand if it's taking an efficient path for a given query. So in order to do that again you're going to start by importing a few libraries that will be useful here. Most of the things you've already seen for Phoenix. And then now you've got some new things from Phoenix for what's called experiments, which you'll learn about in just a second here. So you can import those libraries And one other thing to call out is that, again, the agent's been imported from your utils method here. So you have the same agent that you've been working with the whole time. You're importing that run agent method. So just know that that code is sitting in the background. And then here you're still going to need to connect to the Phoenix client. So in this case I'll just save that Phoenix client variable. And then the way that you can evaluate your trajectory as you learned in the previous slides, is by running a set of queries through your agent and then tracing the number of steps taken by each of those queries. So in order to evaluate the trajectory here, you have to actually compare multiple runs of the agent together. So in order to do that, you're going to use a tool called a run experiment within Phoenix, which allows you to run multiple different runs of your agent and then compare those to each other in certain ways. So here's a good point to pause and dive a little more into experiments and how they work. Experiments in Phoenix are made up of a few different steps. First, it's taking a data set of different test cases, then sending those into a particular task or job to run and then evaluating the results of that task. So a data set of test cases is going to have a bunch of different queries or questions that you might run through your agent. And typically you'll have an input value inside of those. examples as well as maybe an expected output. In this first case, you won't have an expected output. You'll just have an input. Sometimes you have expected outputs, sometimes you don't. And that dictates some of the evaluators you can use later on. So you have a set of examples here. And then you'll run those through a version of your agent. So in this case just going to use one single version of your agent. You'll get the outputs from that particular round. And you'll see actually that you made some slight modifications to your agent to actually track the number of steps taken as well as part of that task. So you can define the task for your experiment. And then once you've collected all the results of each example through your task, you can then send those to a set of evaluators. And those evaluators could be the ones you set up in previous rounds. Or in this case you can do more of a comparative evaluator, comparing different runs of your agent. And so in this case, there's another variable that will be added to each example, which is the output of that particular example through the task. And you'll learn a lot more about experiments in the next lesson as well too. This is sort of just your primer for them in this case. So step one there is creating a data set of test cases. So the way that you can do that is you have a set of questions. In this case convergence questions. And if you think back to the slides, one of the ways that you test for convergence here is you send a lot of different variations of a similar type of query through your agent and then track the number of steps taken. So notice that each of these different examples here is all about the number or the average quantity of items per transaction. So you see the average quantity sold per transaction mean number of items for sale. Calculate the typical quantity for transactions. You see these are all kind of variations on the same question. So the agent should take the same path for each of these. But sometimes there's variation. So you can take this list of questions, create a data frame out of them and then upload that data frame into Phoenix. So now you have a data set that lives in Phoenix. If you want to, you can quickly visualize it here Now, if you want to look in your Phoenix window, you can actually see under the datasets tab here, you'll now have an entry for the data set that you have just uploaded. You can click into that. And you can see all of the different examples that you've uploaded. And now you can use this data set to start running experiments. So for the next step here you have to define the task for your agent. Now you could just run these examples through your agent. But again you want to have some count of the number of steps that were taken. You might want to format some of the messages so you can make slight tweaks to your agent as well. When you're setting it up as a task. So in this case and do a couple of things here. One is actually create a method that will format some of the message steps which you can come back to in a second. And then you'll create a task here that is run agent and track path. So starting with this task you'll see it takes in an example. So if you remember each row of that data set is an example. So you can see in this case you're taking in an example variable. And then it's calling. It's taking the input value from that example. And then calling the run agent method on that particular example. And then finally calling that format messages step. So the return from this run agent and track path, it's going to be the length of the messages inside the agent as the path length. And then the actual message is object. And then this format message step really just is going to go through all of the messages in your agents log. And it's going to format them in a little bit easier to read sort of way. So you can see the two calls that are made and make it a little bit easier to compare here. So now with your task ready and your data set defined, you can start an experiment. So to do that you'll call this run experiment method. And then what this will do is it'll take in your data set as well as your task or your function that gets applied to each row of that data set. And then you can come up with a name. So in this case convergence eval and a description. So if you run that now each row of your data set will be run through the run agent and track path method. And you'll get results back. That'll take a second to run because it's equivalent to 17 runs of your agent. So give that a second to complete. Now you should have some results that look like this. You'll see all those runs completed. And you can actually click and see the results of that experiment inside of Phoenix. So if you click into that, what you'll see is in your data set, you now have an entry for experiments. And you have this first run of your experiment that you named convergence eval. And you can actually click and see all of the outputs for you to run their. In this case looks fairly successful on each of our runs. And so now what you can do is you can actually go back into the code and you can apply evaluators to each of those different experiment runs. And so here's where you can essentially implement your convergence evaluator. And so because you just run your experiment, you have written that experiment has something that you can apply and access in your code. So you can always view the results of that experiment as a data frame where you have the output input and other various columns there. And so that's one thing that you can actually use to run your convergence eval as well. And so in order to calculate the convergence first you need to calculate the minimum number of steps that was taken by all the runs of your agent. And so to do that add in code here to first take your experiment as a data frame like you've just done above. And then look at the output column and turn all of those different, variables into values that you can access, and then calculate the minimum or optimal path by using the minimum function on the path length. Variable within each of those different outputs. So this is just going to give you a number for the minimum path length taken. So if you run that you should get something says the optimal path length is you should probably see five in here. That's a that is the optimal path length that's being used. And one important thing to note is just the way that this has been set up so far is it's counting. Every message has the path length. So it is including things like the system message and the user message. That's fine in this case because you're comparing a bunch of different examples that all include both those variables or both those messages. You just want to make sure that you're consistent. So if you include the user message in the system message again that's fine. You just got to make sure that you do it on every example that you're testing with. So now you can create a method to use as your evaluator. And in this case you can use this, evaluate path length method which is going to take an output and compare it, the path length of that output to the minimum or optimal path length that you calculated earlier. You're also you're going to use this create evaluator decorator here. This is totally optional but it allows you to name the evaluator that you're going to run here and see. Or it'll mark how it shows up inside of Phoenix. And so now you can take the experiment that you've already run and use this evaluate experiment method to take in the experiment. And then this evaluate path length method that you run above. And that will take all of the results from your experiment and run them through any evaluators that you add in here. AKA in this case, the evaluate path length and it will give you a score at the end of it. That'll go pretty quick because it's just a basic code based eval. And if you jump over into Phoenix now, you'll see that your experiment, if you go back to your data set, you'll now have a column here for convergence eval. It's named that because of the decorator that you attached. And in this case we've got a perfect score of one. Our agent has taken the correct path. Every example here. You might see a different value here. Every time we've run this we've gotten a different value. So you may see a different value here. For one. And that'll give you an idea of whether or not your agent is converging towards the correct path.