Unlock Plus AI learning and gain exclusive insights from industry leaders
Access exclusive features like graded notebooks and quizzes
Earn unlimited certificates to enhance your resume
Starting at $1 USD/mo after a free trial โ cancel anytime
We'll now take a closer look at how skills are structured and best practices for creating skills. Then we'll apply what you learn to two examples. One to create practice questions based on lecture notes, and another to analyze the characteristics of time series data. Let's go. In this lesson, we're going to focus a bit on the structure of a skill, some of the best practices associated with it. and then we're going to take a look at two skills that we make and see how they fare when run through the skill creator to see how they perform against some of the best practices. To review, every skill that we make has required SKILL.md file with some YAML Frontmatter that requires a name and a description. In the underlying SKILL.md, we have the content that goes in our skill, and then any references to scripts or any additional text files, assets necessary that are loaded only when necessary. As we take a look at some of the best practices for names and descriptions, you can imagine this is mission critical. Your name and description are not only how Claude can analyze what your skill does, but also detect when to use that particular skill. So with the name, there's a maximum of characters, same with the description. We mentioned briefly, the name has to contain lowercase letters, numbers and hyphens, and in general, stick with the verb plus ing form the name of your skill. For the description, you want to describe not only what it does, but also when to use it. And if there are specific keywords that lead to agents triggering this skill, make sure to lean into those. In addition to the required fields that you have, the agent skills specification allows for optional fields. This could be the license, compatibility, and arbitrary key-value pairs in your metadata. What's important to note here is that while there is a standard on agent skills, there are some skills that you might come across, some built by Anthropic, some others, that don't follow this specification to a T. The skills are in active development, as is the specification for skills as we work across many different model providers and many different agent tooling ecosystems. As we start to move past the YAML Frontmatter and into the underlying body of the skill, there are no underlying restrictions that we have for the format of our skill. However, when you think about building predictable workflows, you want to make sure you have step-by-step instructions. As we saw in other skills, especially the skill creator skill, it's important to specify edge cases, step-by-step instructions, and if there's a reason for a step to be skipped, be very clear why that is. In general, keeping this to under 500 lines is best practice, because we can always reference external files, assets, scripts, when necessary. In general, being clear and concise is valuable, and using forward slashes is mission critical even when on Windows. It's important to make sure the skill works across many different environments. When you think about creating skills, you want to think a little bit about how much freedom you want to give to that skill. Should we allow for general approaches and general directions, or should we be focusing on a specific sequence? You can imagine for following best practices, we might want a low degree of freedom, but for more creative outputs, multiple colors, multiple styles, multiple fonts, we can allow for that high degree of freedom. As we start to think about more complex workflows with multiple skills, breaking things down into sequential steps is always more valuable than having one very, very large skill that tries to do it all. These systems can handle 100+ skills. It's important to make sure that they're named appropriately, not confusing, and can be followed with a predictable pattern. In the specification, there's room for optional directories. And as we've seen with quite a few different skills, there are subfolders for scripts, references, and assets. Your scripts include any kind of code that needs to be read and executed. You also want to make sure you have error handling and clear documentation. Our references contain additional documentation or reference files. And in general, it's often valuable to instruct the skill to read the entire reference file if it happens to be quite long. Finally, we have underlying assets. These could include templates for output, images, logos, data files, schemas, and so on. It's important to note that these directories, scripts, references, and assets are following the standard of agent skills. But you might come across quite a few different skills that don't necessarily follow that particular standard yet. The standard is rapidly evolving and skills are also rapidly evolving. So going forward, we'd expect that skills created follow this standard. But you might come across some that have different folder names and different conventions. Now that we have a good sense of best practices, optional directories, and how to write production grade skills, let's take a look at two examples of skills that we've created, step through them, and then run them through the skill-creator to analyze for best practices and talk about evaluating these skills to make sure we're ready for production. So I'm in VS Code now. And here we have two custom skills that we're going to dive into. The first one is a generating practice questions skill. If we take a look at this skill, we can see that the description is for generating educational practice questions from lecture notes to test understanding. You can imagine you're a teacher or instructor, you want to provide a particular format for input and output. and you want to generate comprehensive questions to test understanding. Let's step through this skill. To start, we have supported formats for input. We specify what particular libraries to use, and we specify what text to extract. We then follow with our question structure. Again, we want to be very specific, so we're specifying the exact order that we want these questions generated in. Starting with True/False, working all the way towards realistic applications. For each of these questions, we have sub guidelines below. We can see here that this skill is not more than 500 lines of code. But if it needed to grow larger and larger, we can always include underlying files to reference to if necessary. As we take a look at some of these examples, for true and false, even coding questions and so on. We can see here, we're being very explicit with the scope and the structure and the required output for these particular questions. As we dive deeper into that output format, We specify that it depends on the user request. And instead of giving direct examples of every single kind of output, we're actually referencing templates inside of our assets folder. If we're dealing with LaTeX or we're dealing with Markdown, we specify exactly how we want that to look like. For example, with Markdown, here's how true and false might look like. With LaTeX, here is how our true and false and examples might look like as we go through. If you find yourself needing a particular kind of output format, instead of putting that all in the SKILL.md, reference it in an external asset or file. Remember that these files, these templates are only being loaded when necessary. So we can be extremely efficient with our tokens and context window. by only loading the particular file in the data format that we need. If there are external resources that we need, domain-specific examples, we can link to that as well, like we do in the references folder here. We're leaning into that concept of progressive disclosure by only loading what's absolutely necessary and referencing external files only when we need. The second skill we're going to look at is a skill for analyzing time series data. We're going to provide a CSV and we want to understand the characteristics before forecasting quite a few different things. What's important to note here is that as we go through this particular skill, there is a very particular deterministic workflow that we want to have. We're making use of a few different Python scripts to perform that particular action. To start, we have a Python script for visualizing the data that we're working with. Plotting the time series, a histogram, rolling stats, box plots, and quite a few more. For working with autocorrelation, we also have plots as well that we can draw. Similarly with decomposition. As we take a look at our diagnose.py, we have underlying functionality for analyzing the data that we're working with. While there are quite a few functions here, I want to draw your attention to what we do at the end when we run our diagnostics. We make use of these functions to analyze data quality, distribution, stationary tests, seasonality, trend, autocorrelation, and finally, end with a transform recommendation. What we have here is a predictable workflow that we want to run each time in a particular order. So let's go back and look at our skill to see exactly how that's done. First we're going to start with the format for our input. We're going to be very explicit for what we should be looking for, the names of the columns and the particular data types. Next we're going to move on to one of the most important parts of this skill, the workflow. Notice here, we're being extremely explicit with the steps that we have, telling our particular skill and Claude to run this exact script when we begin our diagnostics. We then have the option for generating the plots necessary and reporting this data to the user. taking this data, finding what's in the summary.txt and presenting the relevant plots. We can also see here for answering some of these questions that we might need, we have an interpretation.md file for guidance. As we take a look at some of the script options, we can add additional flags if necessary. And as we start to think about what's being output, we can specify exactly the tree of files, text files, images, and so on that we output. We want to be extremely predictable with the data that's coming in the operations that we perform, and then finally the output. As always, if there are external references, we can make sure to list those here. And given that we have scripts that are dependent on Python libraries, we need to make sure that we highlight exactly what those dependencies are and make sure that they're installed so that these scripts run correctly. Now that we've taken a look at these two custom skills that we've created, let's see how they stand up when we run this through the skill creator skill and determine if we're following best practices. We could do this in a couple environments. We can go back to Claude desktop, but what I'd like to show you is how we can use Claude Code with skills. We're going to see this in much more depth in a future lesson. But right now, I'm going to open up Claude Code. I'm going to install the necessary skill in our case skill-creator. And then we're going to use two subagents in parallel to evaluate our analyzing time series and our generating practice question skills. This is a really helpful way to just start the evaluation process for how well we've done with writing these skills. So we're going to go ahead and hop into Claude Code. Unlike Claude AI, Claude Code does not come with the built-in skills that include skill creator. So we need to install those. And we're going to do that using a marketplace. So we're going to head over to our Marketplaces. We're going to add a marketplace for anthropic/skills. This is the repository that we saw earlier that contains two collections. First, document-skills. These include processing Excel files, PowerPoints, Word docs, and PDFs. And the other collection are the example-skills. These are some of the other ones that we saw including the skill creator skill. So let's install this in the project scope. Once we install that, We're going to see that we need to restart Claude Code. And we're also going to see that in our .claude, we have in our settings.json the enabledPlugins that includes these skills. So let's go ahead and restart Claude Code and see what skills we have. And we can do that using the /skills command. If we've done this correctly, we should see that we have our skill-creator skill right here as expected. Let's go ahead and make use of that skill. So we're going to ask Claude Code to use the skill-creator skill to evaluate how these skills have followed best practices. To do this a little bit faster, we're going to use subagents in parallel where each subagent is evaluating each of the custom skills that I have. In order to do this, we're going to be prompted to use that skill, skill-creator, which is great. It's working as expected. We're going to successfully load that skill, read the necessary files, and go ahead and dispatch our subagents to check for best practices. We can see here, it's found the correct skills, generating practice questions, analyzing time series. And let's launch our two agents to evaluate these skills against best practices that we have. Alright, let's see how we did. Well, not too bad. generating practice questions, nine out of ten. We could improve a bit on the conciseness. We've got some nice recommendations here. The good news is we did even better on analyzing time series. We can see some observations here and some excellent job across avoiding duplication, Frontmatter quality, and conciseness. A really nice way to evaluate your skills is to run them through this skill creator, which includes best practices out of the box. So we've run our skills through this skill creator to analyze for best practices in the underlying SKILL.md and associated files, but how can we make sure the skills are working as expected? Here's one example that we could build a harness around to think about writing a unit test for our skills, similarly to how we write unit tests for software. So to start with our generating practice questions, when we think about what the evaluation might look like, we would start with a couple different queries. Generating questions and saving it to a markdown file, to a LaTeX file, to a PDF. we can go ahead and make sure that we're passing in the correct files in the correct format. We can then make sure that our expected behavior is what we need. Using the correct libraries for PDF input, extracting the learning objectives as we specified, generating different kinds of questions and following guidelines for those. using the correct output structure, using the correct output templates that we saw in our assets folder, making sure in certain data formats like LaTeX that it's successfully compiling. And then finally, making sure that our questions are generated to the correct files and the right format. We also would want to make sure that we gather human feedback in this process and that we test this across all the different models that we're planning to use. For our second skill for analyzing time series, we make use of three different Python scripts. So we're going to make the assumption that we've already tested those Python scripts with traditional unit tests in software. Assuming that those scripts are doing what we want them to do, let's now test that everything is happening in the correct order with the appropriate inputs and outputs and expected behavior. The query you might have here is to analyze and generate plots for some time series data. we'd want to pass in some potential CSVs, make sure that the Python scripts that we showed for visualizing and diagnosing are run correct. More importantly, making sure that all the steps in the workflow are in the correct order. we're asking for plots, we want to make sure that that optional step is included. We then want to return a summary, interpret those findings, and finally, create a folder with all the required files in their right place. If you can remember, in the output, we had a very specific location for different files, different folders, and underlying assets. Similarly to our other skill, we want to get human feedback and test across the models that we use. In the next lesson, we're going to take these two skills and bring them into Jupyter notebooks and use the Claude messages API to run these skills using code execution tools to produce outputs programmatically.