In the previous lesson, you explored the Hurricane Harvey Damage Assessment dataset by investigating what the images labeled as damaged and undamaged look like, and how you can plot the locations of the images on the map. Your goals in the design phase are to prototype your data and modeling strategies, keeping in mind any data privacy and security concerns, and to build out the user interface for your solution. In your studies in the previous courses, you've often started by developing a baseline model to see how well you could do without implementing any more complicated AI solution. In the case of doing image classification, like here, images showing damage or no damage, there might be some old-school image processing techniques that could provide a baseline for the task. Like, for example, you saw that in the images showing damage, often the most recognizable thing was brown floodwaters surrounding buildings in the area. So it's possible that you could build a simple image processing pipeline with a color threshold to identify that brown floodwater as a simple indicator of potential damage. And I think it would be interesting to build that, maybe 50 lines of code, and see how much better your machine learning model does perform. Here, however, we're going to skip over this baseline modeling step and jump right into using a neural network to do classification. It's pretty easy to get started with neural networks, and while they're not the only option for working on a solution like this, they have been shown to do a particularly good job at image classification tasks when you have enough labeled data. In the previous courses in this specialization, you used neural networks for various tasks. This included air pollution estimation, wind power forecasting, and animal identification. This project will be most similar to the biodiversity monitoring project, where you identified animals, except that instead of trying to identify many different kinds of animals, often in the one image, you'll just be training your model to distinguish between images showing damage versus no damage. So let's jump into the lab and see how that works. Here at the top, you'll see the steps listed that you'll complete in this lab. Now first, just a spoiler alert, here it says you'll train your model, but in reality this lab environment is limited in terms of computation power, so you won't do the actual training here. You'll instead load a pre-trained model, but we do include a link below that you can follow to get access to the training notebook if you want to try it out yourself. Run this first cell to import the Python packages. Next, you'll load the training data using something called a generator to get started for model training. This next cell is just assigning a label of 1 to images with visible damage and 0 to those with no damage. Next, you'll look at how you can augment the data to make your training more robust. In course 2 of this specialization, you perform data augmentation by flipping, rotating, zooming, and contrasting images of animals. And now it will do the same thing with these satellite images. When you run this next cell, you'll grab one image to explore augmentation with, but if you change this value to another number between 0 and 127, you can look at these same steps for other images. First, you can flip an image either horizontally or vertically, and you can see how these would look by choosing between these buttons, or here to see both flips at the same time. When you are flipping and rotating animals in course 2, it might have seemed strange to augment your data with upside-down animals, but as you can see with the satellite data, the flipped images are perfectly normal examples of what a similar but different image might look like just taken from a different angle. So this should help your model generalize to a broader set of examples. Let's take a look at some other augmentations. Here you can visualize different examples by zooming in using this slider. Here you can look at an example of rotating the image. And finally, you can change the brightness of the image to see how that would look as an augmented example. Then just like when you were working with images of animals, you can apply different combinations of these augmentations to see what they look like. The purpose of performing these augmentations, of course, is to provide your model with more examples to train on. With each of these augmented examples, your model can learn different variations on how images showing damage or no damage might appear. To set up for training with augmentation, you would run this next cell here to prepare a data generator with each of these augmentation steps built in. But like I said before, you won't perform training here. Instead, we have already trained the model for you using the training and validation data, and by running this next cell, you'll just load that pre-trained model. This is a convolutional neural network, so similar but simpler than the models that you use for animal classification in the second course. Here your inputs into the network are, again, images, and now your output is a classification of damage or no damage. You can run this next cell to print out a text summary of the model architecture. You don't need to worry about the details here. Just know that this is a relatively simple neural network of just a dozen or so layers, and now you've trained this network from scratch on the data in the training and validation folders. To get a sense of what the training process looks like, you can run this next cell to look at some of the training metrics. Here you're looking at accuracy on the left and loss on the right plotted as a function of the number of training epochs on the horizontal axis. So you can see that with more training epochs, accuracy went up and loss went down. And this is what we're looking for. Accuracy is just how many times your model makes the right prediction divided by the total number of predictions. So here, you can see that after around 20 epochs or so, the model is already doing better than 90% accuracy. And altogether, the training was for 100 epochs. And if you remember, we had an equal number of damage and no damage items in our training and validation data. So this 90% accuracy is against a baseline of 50%. Loss is just a number to quantify how well your model is doing, and you don't need to worry too much about the units here, just that lower loss means better performance. And if you're familiar with machine learning, you'll take comfort in that your training loss and validation loss are about the same here. This tells us that we're not overfitting our model to the particular training data. The labels of training and validation in the legend here are just indicating performance of the model on the data it was trained on, the training data, versus the performance on the validation data that was not used for training at each epoch. Really commonly in neural network models, you're training very well when your training and validation accuracies and loss are about the same. And that does seem to be the case here. You can run this next cell to perform a mini training of just three epochs on a smaller data set. This is just to show you an example of how training would look like if you ran the process yourself in an environment that was set up to handle the full set of training. So you can see as you run each mini training epoch here, your new loss and accuracy values are printed out. And just to flag, we've sped this up for you in the video. If you look at the actual time each epoch takes, it's a little less than a minute each time. So hopefully this also helps you understand why we didn't have you train the entire model from scratch. You would have been sitting here waiting a long time. Finally, you test the model performance in your test data set, which was never seen by your model in training or validation. To do this, run this cell to first load the test data, then run this next cell here to run prediction using your trained model on the test set. Once the prediction step has run, you can run the next cell to generate a confusion matrix for your test predictions. This is similar to what you did for the animal classification lab in the last course. But again, here instead of many animal classes, you have just two classes, damage or no damage. What you can see in this confusion matrix is the number of instances among the test predictions that were correctly classified. That's on the upper left to the bottom right diagonals. And those are the ones here in the upper left that were images of no damage, correctly classified as showing no damage, and those in the lower right where the label was visible damage and the model predicted visible damage. In other cells here, you can see the number of instances that your model got wrong. It's where it predicted damage, but the label was no damage or the opposite. Notice that instead of the images being labeled as just plain damage, it's visible damage. I think this is an interesting and honest evaluation. We don't know for certain whether or not there was damage on the ground, so we're just indicating that there was, from aerial imagery, visible damage. And this is actually where we get the term ground truth data in machine learning. It comes from the signaling community where you might have a distance signal, often aerial imagery or some other kind of aerial sensor, and that is your measured truth, but your ground truth is the best possible measurement from on the ground itself. And so you'll often hear people refer to ground truth data in machine learning, and sometimes I use it to talk about any kind of data. So I think it's good to always keep in mind that even if it is ground truth data, that's still your best possible measurement, and it's not always necessarily an absolute or objective truth. The way you can interpret your model's performance, given a confusing matrix like this, is to look at different metrics. So first, in this case, you can think of true positives as either case really, but I think it makes the most sense to think about true positives as where damage was present and your model identified damage. So here, we're calling true positives the ones in the lower right. Then the true negatives would be where there was no damage and your model prediction was no damage. These are the ones in the upper left here. Then false positives would be where your model predicted damage, but the true label was no damage. These ones in the upper right. And false negatives would be where your model predicted no damage, but the label was visible damage. So these ones in the lower left. Very often in machine learning, you will see this sort of breakdown used for models predicting the presence of a disease, for example. And then you can look at different metrics depending on what you want to optimize for. You'll often see this breakdown used for models predicting any kind of labeling task in machine learning. And then you'll look at different metrics depending on what you want to optimize for. So overall accuracy here is just the total number of correct predictions over the total number of predictions. But depending on the use case, you might be more interested in one of these other metrics like precision, which is a measure of the number of true positives over all positive predictions. So in this case, precision would be the equivalent to measuring out of all the odd times your model predicted damage, what percent of the time damage was actually present. Or recall, which is the number of true positives divided by the actual number of positives. So in this case, recall would be asking out of all the actual examples which had damage in the test set, how many did your model correctly identify? Depending on the use case, one or more of these metrics or even some other metrics altogether might be the relevant one to gauge the accuracy of your work. For example, if you had many images, but relatively few instances showing damage, or in other words, unbalanced data set, your accuracy would be high if you just always predicted no damage, but your recall would be zero, indicating your model is not useful at all as a damage detector. On this next cell to print out the actual accuracy, precision, and recall for your model. You can run this next cell to visualize the output of your model. On the left here is the model prediction, which is actually on a continuous scale from zero to one, where zero indicates no damage and one indicates damage. And just to flag, this is not intended to be a measure of the amount of damage. This is an approximation of the confidence that damage exists in the image. So here we have an example of no damage where your model made the correct prediction. You can use this slider or the arrow keys on your keyboard to look through more examples. As you can see, most of these are examples of correct predictions, but it can be informative to look specifically at the incorrect predictions. Run this next cell to look through some examples of incorrect predictions. In this first one, image zero, you can see that the model predicted almost at 0.5. And so this could be interpreted as the model saying it doesn't know, rather than it making a hard prediction one way or another. When we go to image one, however, we can see that the model was very confident that there was damage when, in fact, the label here is no damage. And it's hard to say why this is. Perhaps the road or the brown grass in some areas was mistaken to be floodwater. And at image two, we have a prediction that's right at the decision boundary of 0.5. And so again, you can imagine a scenario where, given that this is so close to deciding one way or the other, you might exclude this from a prediction or make sure a human reviews it in a larger product. What you can see is that for many cases, your model prediction is actually somewhere in the middle of the scale, not confidently in the direction of damage or no damage. And that can be useful information in itself. In a real-world implementation of a system like this, it would actually be a common practice to take these predictions, which are lower confidence in a sense, and flag them for manual review by a human. And that's what you do in the next lab for the implementation of this project. So have a look through more of the examples of the correct and the incorrect predictions from your model to get a sense of when your model is working well and when it is not. And that was the design phase. Of course, if you're building this system for deployment in the real world, you would spend a lot longer in the design phase, exploring your model performance, and perhaps trying some different configurations and different data sets. For this project, however, you've adopted a data and model strategy of training a neural network from scratch using augmented data. You've tested the model and found it to be performing relatively well. And you've even prototyped a simple user interface where a person could see the model prediction on a continuous scale from no damage to damage, as well as the image being considered. Join me in the next video to wrap up with the design phase checkpoint before moving on to implementation.