In the last video, you looked at some of the intuition behind how neural networks see image data and that when a neural network is trained to recognize particular objects and images through that training process, it will have learned some key features that are transferable to recognizing almost any kind of object in an image. Now that's not to say the neural network trained to recognize, say, bicycles will also be able to recognize horses, but instead that a model trained to recognize bicycles could, with just a little bit more training, also be able to recognize horses or trains or anything else that you have labeled images of. In this lab, you'll start with a model that's been trained to recognize many different types of objects. The training data that was used to prepare this model is a dataset known as ImageNet, and this is a well-known historical benchmark dataset that was commonly used for performance testing image recognition models. ImageNet contains many different object categories, and the pre-trained Neural Architecture Search Network, or NASNAC, model that you'll be using is able to accurately identify many object classes right out of the box. For your purposes, however, you'll apply the technique of transfer learning, where you take this model that is already trained to recognize many object types and fine-tune it to recognize your objects of interest, namely animals in the Cairo National Park. So let's head over to the lab and see how this works. This is the second part of the design phase. In the first design lab, you applied the mega detector to identify where animals were in an image in your dataset, and then you cropped those images and squared them to create a dataset of pictures that contain only animals. Now in this lab, you'll fine-tune a pre-trained NASNet model to classify the animals in your dataset. With this link here, you can learn more about the technical details of a NASNet model and follow this link to learn more about the ImageNet dataset that the model you used here was trained on. To begin, you'll run this first cell to import all the Python packages that you'll need for this lab. Next, you'll load the pre-trained NASNet model and try it out before fine-tuning. So this image shows a simple visual of what you'll do next. You'll pass in an image like this image of a baboon here as an example. Then you'll ask the NASNet to classify that image into one of the 1,000 existing classes of objects that NASNet was trained on. When you run this next cell, you can choose a type of animal to pass to the NASNet model for classification by using this pull-down menu. The output that you'll see here is the original image and then the top three predictions from the model on the right. So here we'll pass an image of a baboon and now I'm trying an ostrich and the model thinks it might be a cardoon or a steel arch bridge, which is obviously very wrong, and let's try another one. Let's try a black rhino and once again a steel arch bridge and cardoon are our top two predictions and it might be a lemon. Well, I think the model is a lemon in this particular case. Hopefully you've got a very good intuition now for how bad ImageNet is in working on this particular data set. So now let's look at a curry mustard and we didn't get this right either. It's predicted as being either a speedboat, a lab coat, or my son would like this, an electric locomotive. But it's also worth noting here that the probability is incredibly low, so less than 5% confidence for each of these. So really this isn't the model saying that this is a speedboat, it's the model saying I don't know. And let's do one more. Let's look at an iland and well, we're now almost 100% confident that this is a steel arch bridge. So not only is the model wrong, it doesn't know how wrong it is. And so I think this is funny in this case, but also I think a very good lesson for how models built on one data set can be really inaccurate when taken to a new data set. And for ImageNet in particular, this is something that's well known. In fact, I was at Stanford when my colleagues first developed ImageNet and at the very first presentation myself and other people who had worked in WordNets for other languages had pointed out that yes, this ImageNet based on the English WordNet is going to be really biased towards the kind of animals and taxonomies of animals that you'll find in the English speaking world and in particular North America and Europe and disproportionately people who could afford to take digital photographs at a time when that was more expensive. And so while in this case it's mainly funny, there are lots of cases where this kind of bias, if it's not carefully monitored and mitigated as much as possible, can have a really negative impact. And so we'll talk more about how we are addressing this in this data set. And one of the positive things which we can take away is that, well, for one, we can measure this bias. We will be able to look at the accuracy across different animals and we'll be able to know which of those do or do not occur in ImageNet. But also, given how terrible this is, I think it's kind of amazing that ImageNet is going to be useful at all in the basis of transfer learning. But we'll find that it is because while the classification task here isn't really correct at all, a lot of the textures and edges that are found in the ImageNet data set can become useful components in identifying the images in our data set here. Next, you will load the data by first defining the folder where the crop images you created in the last lab are stored, and some details about how many images you want to use in each training batch, and we're saying 32 in this case here, as well as the size of the images in the pixels. And this is all just set up, getting ready for training the model. Next, you'll apply a technique that we call data augmentation that you'll use to compensate for this fact that your data set is imbalanced across the different types of animals and camera locations. Run these next two cells just to prepare some images for visualization. When you run this next cell, you'll see a picture of a baboon that I'll use here as a sample to show different kinds of image augmentation. But if you want to change this number up here from anything from 0 to 31, you'll be able to see the same effects for different images. With augmentation, you'll use your existing data to generate new labeled image examples to balance your data set and help your model learn. Run this next cell to see examples of a simple augmentation technique of flipping an image about the horizontal or vertical axis. So if you choose horizontal here, you can see what it would look like to flip this image from left to right. Now you can add this new flipped image to your data set as an example that's different from any of the existing data, but represents a realistic example of the object that you're trying to identify. You can also choose vertical here to flip the image top to bottom. Now in this case, you might say that this does not look like a realistic example because you're never going to see an image in the real world with an upside down baboon, or pretty rarely. However, for the purposes of model training, this is a legitimate augmented example because it still contains the animal and can help your model avoid learning things about the sky or bushes or the orientation of the animal that might confuse it in later attempts at classification. And I'm sure you can imagine that if you're doing this for something like satellite imagery, we would have perhaps almost no concerns at all about flipping vertically. And of course, you can also flip the image both horizontally and vertically to come up with a third variation. So by flipping an image like this, you can create three new example images for your model to train on that are different from the original data set and will help your model learn. At this point though, you're probably thinking, this is not as good as having three images of completely different baboons at different camera locations. And you're certainly correct. And again, this is something that you should be able to evaluate in the testing phase. Are you less accurately able to identify animals that were less frequent despite the data augmentation? Another way to augment your data is to apply zoom factor to images. When you run this Nexcel, you can use the slider here to adjust the zoom and see the different variations of this image that you could create with different zoom factors. Similarly, you can apply a rotation to the original image to come up with an augmented variation. Run this Nexcel and you can apply a rotation from negative 45 to 45 degrees on the original image. And finally, you can apply a contrast to simulate an image taken under brighter or darker conditions. When you run this Nexcel, you can use the slider to apply different contrasts. In practice, you will apply a random combination of flips, zooms, rotations, and contrasts to generate the supplementary data for the purposes of trying to balance out your data set across the different classes, in this case, the different animal types. When you run this Nexcel, try applying different combinations of flips, zooms, rotations, and contrasts to see what kind of data this will generate. Next, we're going to do our best to balance the data set before fine-tuning. Here again, you'll split your data into training, validation, and test sets in a similar way to what you did in the wind power forecasting lab in week two of this course. Here again, your training set will be the data that you would use to fine-tune your NASNet model. You'll then evaluate the trained model iteratively on the validation set. And then, when you finish training, you'll test how well your model does on the test set, which is data it never saw during the training and validation process. So as you'll recall from your exploration of this data set, for some animal types, you have only a handful of example images. And for the purposes of fine-tuning, this will be problematic. So next, in order to fine-tune the model, you'll restrict your training set to only those animals that have a relatively large number of example images in the data set. And at this point, you're probably thinking, well, this is going to be a bias as well. Rare animals might be just as important. And you're absolutely right. So depending on the real-world use case here, this might not be a viable option. Here in this lab, we have set a cutoff of at least 65 example images, which turns out to be about 11 different animals in total. And so you'll map the numbers 0 to 10 to each of these animals here for training. You could, however, set a higher or lower cutoff on the number of example images. Next, you'll look again at how many images you have of each animal in your training set. And then you'll resample each image class, which means upsampling some classes and downsampling others such that you have the same number of each class to train on. What this means is that for some classes, you'll discard excess images. And for others, you'll make more copies of the images that you already have. Making copies of training examples might not sound like a particularly clever way to increase the number of examples in your training set. But next, you'll look at how you can augment your data to simulate different training examples. Next, you'll load another version of the NASNet model for fine-tuning to our specific data here. To do this, you'll freeze the lower layers, meaning for the layers of the network that have learned features that are common to detection of any kind of objects, such as edges and specific textures, you'll leave those layers as they are and not modify them during training. If you're an AI practitioner and you're interested in more detail on the model architecture, you can run this next cell to print out a summary of the model architecture. If you're not familiar with neural network models prior to this course, one takeaway here is that there are very many layers in the NASNet model, and this is pretty typical of large-scale image recognition systems. And the next step is to train the model. This involves adding a final layer that is set up for the 11 classes that you hope to train on. With the code that's written below here, you could train this model for as many epochs or iterations as you like for fine-tuning the final layers of your model to optimize for identifying the animals in your dataset. To run fine-tuning for a sufficient number of epochs, it would actually take a long time, possibly several hours. So we've already completed the fine-tuning step for you by training for 150 epochs and save the result. By running this next cell, you'll load the model that was trained for these 150 epochs. To get a sense of what the training process looks like, you can run this next cell to train your model for one additional epoch, which should take a minute or so to run. You can try training for more than one epoch as well if you'd like by changing this number here. In order to know whether you have trained your model for enough epochs, you can look at these plots below, and these plots show you data from the original 150-epoch training process that we performed for you. On the left here, you're looking at the number of trained epochs on the horizontal axis and the accuracy of the model on the vertical axis. So the accuracy is going up, which is what you want. Blue is the accuracy of your model on the training set, and red is the accuracy of your model on the validation set. What you can see is that while the accuracy of the training data continues to improve with each new epoch of training, the accuracy of the validation set did not really improve that much beyond 100 or so epochs. So this tells you that training further will not be likely to improve your model performance. In fact, you are doing what's known as overfitting your model, and this is likely to lead to less reliable confidence metrics, like when we saw that an animal was predicted to be a bridge with near 100% confidence. Next, you'll evaluate your model by calculating accuracy on the test set that wasn't used during training. When you run this cell, you'll find your accuracy on the test set is around 80%, which means that across all 11 animal classes that you trained on, you correctly predicted the animal in the image about 80% of the time. While 80% might not sound very high, it is actually quite an impressive result, given that you have trained your model to recognize animals across 11 different classes with a relatively small data set, and in some cases, including less than 100 examples of a particular type of animal. It is very likely that you can improve on this result by training on a larger number of labeled items, but by taking advantage of pre-trained models and transfer learning, you were able to create a fairly powerful animal detection pipeline with a relatively small data set. So this is the kind of data set that even in industry, I could use as a starting point to bootstrap a much more accurate data set, expert human input. Here, beyond just looking at the overall accuracy of your model, it is informative to look at something called the confusion matrix, to better understand where your model is succeeding and where it is failing. Now what you're looking at is each animal class plotted along the vertical and horizontal axis. Along the diagonal here, you can see the fraction of examples in the test set that your model correctly predicted for that animal. So for example, you can see that your model correctly identified baboons 94% of the time. However, it only identified kudus 74% of the time. You can look at the cells off the diagonal to get a sense of how your model is failing when it does fail. Here if you read across any row, you can see that for a particular animal, how often your model predicted that it was something else. So for this example here, the black-backed jackal, 6% of the time, your model misclassified something that was actually a jackal as a baboon. Or for the Gemsbok RX, 8% of the time, your model misclassified it as an eland. And finally, it's important to visually inspect the results of your model for individual images. Run this last cell to display an example image along with the top three results in terms of confidence level from your model. So here is an image of a baboon from your test set. And on the right, you can see the confidence associated with the top three predictions for your model. So you can see that for this image, your model predicted correctly baboon with the highest confidence. You can use this pull-down menu to see examples of other animals in your test set along with your model results. So in this example, we have a curry buster and we have correctly predicted this with almost 100% confidence. So this is a great result. And here we have a Gemsbok RX, which was misclassified as an eland. And you can see the third classification here was the correct one, but that was at less than 10% confidence. And finally here, we have a black-backed jackal. And again, we correctly identified this with about 100% confidence. So try this across a number of different animals with this pull-down menu to see examples of how other animals performed in your dataset. Congratulations, you have now prototyped a working model pipeline that can take you from a raw camera trap image on through to classification of each animal that appears in an image. And you've thought about the limitations of starting from a pre-trained model from very different data to the data that you're applying it to and considered how much you can make up with transfer learning. The next step will be to pull your model pipelines together into a final implementation with a user interface. But before we do that, join me in the next video to wrap up the design phase.