Now that you've explored how to optimize models by tuning different hyperparameters, there's one final topic to cover in this module, model efficiency. Until now, your focus has been on evaluation metrics like accuracy. But in real world applications, especially in production, other metrics often become just as important. Think about the environment where your model might be deployed. If it's running on a powerful GPU server, accuracy may be the primary concern since you have plenty of memory and a fast device. But in resource-constrained environments, such as edge devices like smartphones, drones, smartwatches, or other low-power embedded systems, you need to consider additional factors. These include model size, particularly its memory footprint, inference time, or how long it takes to make a prediction, power consumption, and latency and throughput requirements. These factors involve trade-offs. Improving accuracy with a deeper network may lead to a slower and larger model, which may not be suitable for real-time applications. In this video, you're going to explore evaluating models not just by accuracy, but also by their efficiency. And this will help you decide which model is the most appropriate for your deployment needs. Specifically, you'll compare two models, an optimized CNN and a ResNet-34 architecture. And you'll look at three metrics, accuracy, model size, and inference time. You already know how to calculate accuracy. So let's learn how to determine the model size and its inference time. Let's start by creating a get_model_size() function to extract the model size. And here's how it works. model.parameters returns all trainable parameters, such as the weights and biases, that will be updated during training. For each parameter, the number of elements is calculated by multiplying param.nelement() by the number of bytes that each element occupies, which is given by the param.element_size(). We use 4 bytes per element for float32. A similar calculation is done with model.buffers. These are tensors in the model state that are not updated during training, such as running statistics from batch normalization, pre-computed constants, and fixed embeddings for transfer learning. While not trained, buffers are important for inference and model behavior. Finally, we add the sizes of the parameters and buffers. We'll divide it by 1024 squared to convert those units from bytes to megabytes. And this value is the model's memory footprint. The second metric of model efficiency is inference time, which indicates how long the model takes to make a prediction. And this is crucial for real-time applications like self-driving cars or live video analysis, and also for batch processing on large data sets where latency can quickly add up. Here's the code that you'll use in the lab to measure inference time. First, you'll set the model to evaluation mode for inference. Then you'll select a sample input and move it to the appropriate device. You run a brief warm-up loop, which is important because initial inferences may be slower due to lazy initialization by PyTorch or GPU scheduling. Then you'll time the model's forward pass over several iterations. And finally, you'll divide the total time by the number of iterations to obtain the average inference time. You can convert it from seconds to milliseconds and obtain an estimate of model performance during inference. Once you've calculated these three key metrics: model size, inference time, and accuracy, you can integrate them into a single function that returns a comprehensive model summary. To use it, you provide a trained model and a test loader. This function returns a dictionary of results, offering a multidimensional view of the model's performance. This allows you to evaluate which model is the most practical for deployment beyond just the performance figures. Now that you have this summary function, you can compare multiple models side by side. In the lab, you'll evaluate two CNN models, Optimized_CNN and ResNet34, by looping through a dictionary of models and invoking the evaluate_efficiency() function on each. Before printing the results, they're formatted as a Pandas dataframe for a tabular view of performance and resource usage. This table shows the side-by-side comparison of the models across the three metrics. You can visualize this using a scatter plot like this one. Here, the x-axis is inference time and the y-axis is accuracy, while the size of each dot represents the model size. In this case, it's evident that Optimized_CNN achieves higher accuracy, faster inference time, and consumes less memory than ResNet, and that makes it the preferred choice. But real-world choices aren't always so clear-cut. When you don't have one model that outperforms the other on every metric, you'll need a smarter selection process, like constraint-based selection or weighted scoring. Constraint-based selection can be used when your deployment involves resource-constrained environments, like mobile phones or edge devices, and where you need to enforce strict limits on model size and inference time. This approach involves filtering out models that exceed memory or speed limits and selecting the one with the highest accuracy from the remaining options. And here's how that logic plays out in code. First, you filter models, retaining only those that meet both your size and your inference time constraints. If no models meet that criteria, you inform the user that the constraints are too strict. Finally, among the valid remaining models, you choose the one with the highest accuracy. The second strategy is weighted scoring, which you might prefer when the requirements are more flexible. Instead of strict rules, you assign weights to each metric, reflecting their relative importance to you. You can create a scoring function that returns the best model based on weighted scores. And you can build a scoring function in PyTorch. The weights parameter allows you to select weights for each metric. Here, you set default values of 50% for accuracy, 20% for model size, and 30% for inference time. An essential step in this function is normalizing those metrics. As they're all on different scales. Accuracy is a percentage, size is in megabytes, and time is in milliseconds. These metrics also differ in desired direction. For example, you want to maximize accuracy, but minimize model size and minimize inference time. Without normalization, one metric might dominate due to the different numeric scales. You can implement normalization inside the scoring function. For each metric, it gathers values across models and then computes the maximum and minimum values to scale it to a 0 to 1 range. For identical values, division by 0 is prevented by setting the range to 1. For accuracy where higher is better, you apply min-max normalization. This involves subtracting each value from the minimum in the range and dividing by the difference between the highest and lowest values. So the highest accuracy score is 1, and the lowest is 0. For model size and inference time where lower is better, you flip the result by subtracting 1 minus the min-max normalization. So the model with the lowest metric score is 1, and the largest score is 0. Now that all metrics are on a 0 to 1 scale, you can compute the weighted scores for each model by multiplying each normalized metric by its weight and summing them all up. And then you return the name of the model with the highest score. As an example, suppose you want to evaluate these five models with the weighted scoring method. First, you'll normalize each metric to bring their values into the 0 to 1 range. And then after normalizing, you calculate the weighted scores for each model by multiplying the metric values by the weights. Using the default weights that we defined earlier, the equation would look like this. ResNet18, for example, would score 0.5. You can then compute the weighted scores for every model and select the winner with the highest weighted score, which in this example is ResNet50. This method gives you full control over how metrics are prioritized, allowing you to balance model performance and efficiency. This is ideal for real-world deployments where trade-offs matter. In the upcoming lab, you'll apply both constraint-based and weighted scoring techniques in PyTorch. And that wraps up our discussion on model efficiency and the final video in this first module. Throughout this module, you've built a foundation for practical deep learning optimization, beginning with an overview of optimization in a deep learning context, from parameters learned during training to hyperparameters and architectural decisions that shape model performance. You explored learning rate schedulers guiding the optimization process over time and how they can stabilize or accelerate learning. Next, you reviewed hyperparameter tuning, detailing an extensive list of tunable hyperparameters. Then you designed flexible architectures, learning to parameterize CNN layers, dropout rates, and fully connected sizes. Using Optuna, you automated hyperparameter and architecture search and explored interpreting its results to guide your design. Finally, you looked beyond accuracy by optimizing for real-world constraints like memory footprint and inference time. You compared models across multiple criteria, making structured decisions when trade-offs were present. The techniques and tools you got to know in this module form a comprehensive workflow for selecting and refining deep learning models for practical deployment and not just models that will perform well on benchmarks. In the next module, you're going to shift focus to TorchVision, building data pipelines, applying transforms, and using pre-trained models to accelerate your image data workflows. Let's continue our journey.