To start things off, you'll learn some basics about energy and the electric grid as they relate to computing infrastructure and carbon emissions. You'll see how model training and inference, especially when it comes to LLMs, can have a significant carbon footprint. But you'll also get familiar with key factors and decisions that impact how large or small this carbon footprint is. Let's dive in. Each step of the machine learning lifecycle has a carbon footprint. Before we even get to writing code, there are carbon emissions associated with all of the physical hardware that we use to train, host, and store machine learning models. Think the CPUs, GPUs, TPUs, servers and physical data centers. Creating all this infrastructure requires extracting raw materials, manufacturing, shipping and there are environmental costs associated with each step. This is often known as embodied carbon, which refers to all of the emissions produced in the supply chain for a given item or product. These emissions are usually the most difficult to estimate in the machine learning lifecycle, because supply chains are pretty complicated. Now, what's a little easier to calculate are the emissions produced from training and serving machine learning models. In other words, the emissions from compute. Training machine learning models are using them for inference requires energy to power and cool servers. Whether that's running pre-training on massive data sets to generate a foundational LLM, or training on custom data sets to produce fine tuned models, or even at inference time when users interact with your large language model based application. All of this requires energy, and all of this has a carbon footprint. That can vary pretty widely depending on a number of factors. For example, Strubell et all at the University of Amherst found that training a transformer model with neural architecture search was comparable to the carbon emissions of five cars that were run over their useful lifetime. If you aren't familiar, neural architecture search here refers to a technique for automating the design of neural networks. But a paper from researchers at Google, showed that training the same transformer model was actually closer to 0.00004 car emissions. That's a big discrepancy, and it highlights how factors like training strategy, hardware and data center efficiency and location can have a massive impact on carbon emissions. By being a carbon-aware developer, you can take steps to lower the carbon emissions from your machine learning workloads. And that's exactly what we'll learn about in this short course. Note that while tracking and understanding emissions throughout the entire machine learning lifecycle is really important, in this course, you'll just focus on emissions generated by compute. You'll learn a little bit about missions on the serving side, meaning inference when your models are used in production. But overall, the focus will be on training, because that's a little easier to calculate and there's just been more research so far compared to the carbon impact of serving models. So let's zoom in and talk about why training and serving machine learning models has a carbon footprint in the first place. In short, running any kind of compute workload, machine learning or not, requires energy in the form of electricity. And this energy often comes from sources that release CO2. As a result, cloud computing actually emits more global greenhouse gas emissions than the commercial airline industry. According to numbers from the Shift project, it's estimated that cloud computing accounts for 2.5% to 3.7% of all global greenhouse gas emissions. Note that these numbers are for cloud computing in general, and not specifically machine learning workloads. So here's a brief overview of how our electric grid works. Energy is produced by power plants. These power plants could be sourcing energy from carbon-emitting fossil fuels like coal or gas, or non-carbon emitting sources like wind or solar. This energy then flows through a network of transmission lines and substations, which is known as the electric grid, to power our homes, offices, restaurants, hospitals, and even data centers. The mix of carbon and non-carbon emitting sources of energy is different for each regional grid. For example, France relies really heavily on nuclear, Sweden on hydro, and Texas on natural gas. And even within a particular regional grid, the carbon intensity can change depending on the time of day. Because some sources, like solar, are only available during the daytime. If you want to see what regional grid you're connected to and what kind of energy sources are powering your laptop, your refrigerator, and the lights in your home, you can check out Electricity Maps. They provide data quantifying how carbon intensive electricity is on an hourly basis across over 50 countries. In fact, in the next lesson, you'll use their API to get real-time energy data about the grid and maybe even the grid that you're using right now. So let's jump over to the Electricity Maps web app real quickly. So for example, here is the energy breakdown for France at the time of recording this video. You can see that nuclear actually makes up most of the energy on this regional grid. And the current carbon intensity is 20g of CO2 equivalent per kilowatt hours. That's a bit of a mouthful, but it's telling you how much carbon is being emitted when generating a certain amount of usable energy. Now let's go explore another location. So if we move down here over to Brazil, you can see that Brazil has a few different regional grids. There's North Brazil, northeast Brazil, we have central Brazil. We click into one of these. You can see that things look a little different, where a lot of the energy is being supplied by solar and hydro, and the carbon intensity here is 82g of CO2 equivalent per kilowatt hour. Grams of CO2 equivalent is a measure used to compare the emissions from various greenhouse gases on the basis of their global warming potential, by converting amounts of other gases, like methane, to the equivalent amount of carbon dioxide with the same global warming potential. You'll often hear carbon used as this broad term referred to the impact of all types of emissions and activities on global warming. But CO2 is not the only greenhouse gas. So, for example, one ton of methane has the same warming effect as about 84 tons of CO2 over 20 years. So we normalize one ton of methane to 84 tonnes of CO2 equivalent. And note that kilowatt hours here is a measure of energy, not of time. Now here's the key insight. We can figure out the carbon intensity of a particular regional grid. We just saw how to do that using the electricity maps app. So if we know the amount of energy consumed by machine learning training or serving application, and we know where this workload is being run, then we can estimate its carbon emissions. So how do we do that? How do we estimate how much carbon is released from a machine learning training job? Well, firstly get the carbon emitted per unit of energy. Again that's grams of CO2 equivalent per kilowatt hours. And you'll often this referred to as carbon intensity. So we get this for the electric grid that you're using to power your compute infrastructure. Then you'll multiply this number by an estimate of how much energy you use to train your machine learning model. And then from that, you can estimate how much carbon is emitted as a result of that training job. So far, you've learned how to find the mix of carbon and non-carbon energy sources powering the grid. But what about the other piece of the puzzle? How much power does it actually take to train a large language model. We can measure the energy consumption in kilowatt hours with the following formula. Kilowatt hours equals hours to train times number of processors times the average power per processor. Where a processor may be a CPU, but more likely for ML training, it's a GPU or TPU. Now note, that if you're training in the cloud, the energy used includes not just the electricity to run the compute, but also the overhead to run the data centers such as cooling systems. So you need to multiply by a factor called power usage effectiveness, or PUE to estimate the total electricity use for your training job. PUE is a standard way to measure data center computing efficiency. It's calculated by taking the total energy used in the data center, divided by the energy only used for computing. So, for example, let's assume that your workload consumes 100 kilowatt hours, and the PUE of the data center where it's running is 1.5. That means that the actual consumption from the grid is 150 kilowatt hours, where 50 kilowatt hours goes to the overhead of running a data center, and 100 kilowatt hours goes to the hardware where your workload is actually running. Now, as we just talked about, to get from the kilowatt hours to an estimate of carbon emissions, you multiply it by the carbon intensity of the grid, where your workload is running. Now, you might recall that large language models require a lot of pre-training hours and a lot of processors. One way to quantify the amount of compute required is GPU years, or the amount of compute that it would take a single GPU one year to complete. One particular GPU that you can use as a unit of measurement is the Nvidia V100 GPU. So, for example, in 2020, GPT-3 took 405 Nvidia V100 years to train. In other words, if you used a single Nvidia V100 GPU, it would have taken at 405 years to train GPT-3. So it's a good thing they had more than one GPU available. In the paper, Measuring the carbon intensity of AI and cloud instances, researchers estimated the energy consumed by different transformer models. They trained a 6 billion parameter transformer model for 13% of the time it would take to train to completion, and they found that that resulted in 13,812.4kW hours. That's actually more energy than the average American household consumes in a year, according to the US Energy Information Administration. The authors of this paper then estimated that a full training run would consume approximately 103,500 kilowatt hours. Large language models are big, and they're training on a lot of data, and this means they can rack up a lot of energy use. Now, the good news is that you often aren't training a large language model from scratch. With prompt engineering, you might be able to get a model to do what you need without running, training, or even fine tuning jobs. But remember, energy is still required every time you use one of these models for inference to make predictions or generate text. In the paper Power Hungry Processing: Watts Driving the Cost of AI Deployment? Researchers estimated that the stable diffusion xl based 1.0 model generates 1594g of CO2 per 1000 inferences. To put that in some context, that means that every 1000 inference requests to this model is roughly the equivalent of four miles, driven by an average gasoline powered car. To make it easier to find more efficient models, the HuggingFace LLM perf leaderboard actually includes energy estimates and tokens per kilowatt hours, and you'll see a pretty wide range and how much energy each of these models requires for inference. There's also the ML energy leaderboard, which compares various open-source models by average GPU energy consumed by each of these models to generate a response. These leaderboards are resources you can use if you want to make energy consumption that part of your large language model selection criteria. So far, you learned about how gen AI and computing more broadly has a carbon footprint. You've learned why that carbon footprint comes from and how you can estimate it. Now, you might be wondering, well, why should I care? What can I actually do about any of this? I think that the idea that you as a developer, can have an impact on these emissions might seem a little daunting at first. I mean, at the end of the day, you're just running a job on a machine connected to some electric grid out there, but there are actually strategies you can implement to be a carbon-aware developer. For starters, one thing that really impacts the carbon emissions is the location where you train your model and the time of day that you train it. As I mentioned earlier, different regional grids have different mixes of energy sources, which also fluctuate throughout the day. If you're running your workloads in the cloud, this means that certain cloud regions are connected to electric grids that have a higher amount of carbon free energy sources operating on them. As a result, where you choose to train your models can have a big impact. Different regional grids can have significantly different carbon emissions profiles, and even regions that are geographically close together. If you run a machine learning training job in a location that has 100% carbon-free energy, actually produces zero carbon emissions from compute. In the next lessons, we'll make this idea a little more concrete and see how to retrieve real-time information on the carbon intensity of the grid. A more sophisticated strategy is not to just pick locations with low average carbon intensity, but to run flexible workloads at times of the day when there's more carbon-free energy available. This technique is sometimes called follow the sun and wind. And the basic idea here is to move workloads to times and places where there's a lot of renewable energy available. This could mean that you delay running an ML training job until later in the day, when there's more renewable energy available, like noon, when the sun's capacity is at its highest. But it could even need something a little more complicated, like pausing and resuming a workload to minimize total emissions throughout the course of a day or a week, etc. If you're training or hosting models in the cloud, you can try to pick a cloud provider with a low PUE. Again, that's the power usage effectiveness. This number actually can vary pretty widely. For example, the average Google Data Center PUE is 1.10 and can even be as low as 1.06 in some scenarios. While according to the Uptime Institute 2021 Data Center survey, the Global Average Data Center PUE is around 1.57, so that's almost 50% higher electricity and carbon emissions just around the same amount of compute. Another strategy for being a carbon-aware ML engineer is to use efficient hardware. The authors of the paper, The carbon footprint of machine learning will plateau, then shrink, site that using processors optimized for ML training like TPUs or some more recent GPUs like Nvidia V100s or Nvidia A100, compared to general purpose processors, can actually improve performance per watt by factors of 2 to 5. And as we talked about earlier, one benefit of gen AI is that even though the models are big and training takes a long time, you usually aren't training from scratch, and you often don't need to train a model at all. While the trend in gen AI has been to chase bigger and bigger models, More specialized and purpose built, smaller models can often do the trick. There are many different strategies for lowering carbon emissions across the entire lifecycle of software, and there's increasingly more interesting and illuminating research in this area. But these are just a few important tips when it comes to machine learning. So now let's get to coding and see how we can get access to real-time carbon data and train machine learning models in places that have more carbon-free energy available. Let's move on to the next lesson.