Welcome to course three, Advanced Architectures and Deployment with PyTorch. You've already built a solid foundation, creating, training, and optimizing PyTorch models. Now you're gonna go further, looking at custom architectures, working with both vision and natural language processing models, and learning the essentials of preparing models for deployment. And since this is the final course in the professional certificate, congratulations, it's your chance to pull everything together and finish strong. So let's start here in module one with custom architectures. So far, most of your models have been very linear. Data comes in, flows through a stack of layers, and out comes a prediction. It's straightforward and it's easy to follow. But that's not always going to be the case. Some models take multiple inputs and outputs, others reuse the same block in different places, some branch into different paths depending on the input, and some even build their structure dynamically as they run. To build models like these, you need control over the flow of data through your layers, and not just understanding which layers exist. And that's where custom architectures come in. They'll give you that direct control written as plain Python code, thanks to PyTorch's dynamic graph. But before you dive in, let's pause for a reminder about modular design. You've seen this before. It's that idea where you take a piece of your model, you wrap it up in its own nn.Module, and then you reuse it whenever you need to. So instead of rewriting the same code over and over and over, you just build a block once and drop it wherever it fits. And with custom architectures, that matters even more because things get complicated quickly. Keeping a model organized with reusable blocks is what lets you manage that complexity. With that in mind, let's walk through four examples of custom architecture patterns that you can build with PyTorch. Now, they're not the only ones, but they're especially useful, and together they'll give you a sense of just how flexible your models can be. The first will be models that can take multiple inputs and have multiple outputs. Second will be parameter sharing, and that's reusing the same block and weights in different places. Third will have conditional execution, and that's models that will actually make decisions while they run. And finally, dynamic model creation, and that's architectures that can adapt on the fly. So let's start with the multiple inputs and outputs. Imagine you have a camera that gives you two types of data, the image itself and metadata, like shutter speed, location, or time of day. They're both very, very useful, but they don't behave in the same way. So ideally, you would want to pass each input through its own layers before combining them. With nn.Sequential, you might try stitching together two separate models, and that could work, but it's clunky. And more importantly, those models don't learn together. There's no gradient saying, adjust the image features this way because of what the metadata is telling you. But what you really want is one model with two branches that fuse into a shared representation, and then a final layer that makes the prediction. And you can extend this idea. One set of features, but two predictions, maybe like scene type and lighting conditions. Older frameworks could do this too, but it often meant awkward workarounds. In PyTorch, it's natural. Two inputs go in, pass through their branches, concatenate into shared layers, and then split into two outputs. Your training stays clean. One forward pass, two losses added together, one backward pass. Everything updates together. Multiple inputs and outputs can fit naturally into a single module with plain Python code. So now, let's take a look at parameter sharing. Imagine you have two images, and you want to predict whether they show the same product. You need to pass each image through an encoder, and that's a network that turns the image into a feature vector. But you don't want two separate encoders, even if they look identical. They'll train independently. Each set of parameters will get its own set of gradients and optimize their state, and as a result, the weights will drift apart. What you want is for both images to pass through the same encoder, the same layers, and the same weights, so that they learn a consistent representation. And that way, both images will map into the same feature space. Again, other frameworks could do this, but it often meant juggling scopes or variable names. In PyTorch, it's straightforward. You define the network once in __init__, and then call it on both images in forward. Same object, shared parameters, shared learning. Updates from both images improve the same features. Next up, let's look at conditional execution, or it's really just putting if statements in your code. For example, imagine you're analyzing images to detect objects. If an image is simple and clear, a shallow stack of layers might be enough. But if it's cluttered and noisy, you might want to run it through deeper layers for a better result. And that means your model has to make a decision while it's running. With sequential, that's not always possible. It always runs the same layers in the same order. And older frameworks could probably do this, but again, with clunky workarounds. With PyTorch, it's just Python. You write the if statement inside your forward, and the model follows whatever path the input requires. Now you might wonder, how does backprop work if the model only took one branch? And as you've heard before, PyTorch builds a fresh computation graph on every forward pass. If your input goes down one branch, the graph will record that path. On the next input, if the condition takes a different branch, PyTorch will build a new graph for that path instead. So when you call backward, it simply follows the graph from the path that was taken and then discards it so the next forward can start clean. And that's what makes conditional execution seamless. Whatever branch your model runs, it's always traced correctly. Finally, let's talk about dynamic model creation. So you want to tune how deep your model is, maybe during a hyperparameter search. You don't want to rewrite the whole model every time. Instead, you'd like the number of layers to be set by a variable, as you can see here. So how do you actually build a model that adjusts its depth automatically? Your first thought might be it's a list comprehension and then loop through all of those layers in forward. But here's the catch. If you store layers in a Python list, PyTorch does not know that they're part of the model. They won't show up in model.parameters(), they won't train, they won't save. And that's why PyTorch gives you nn.ModuleList. It works just like a Python list, but it tells PyTorch these are real layers, keep track of them and everything will work as expected. Just remember, more layers means more parameters. It's easy to build a giant model without realizing it, so be careful. There's also nn.ModuleDict. It's just like a Python dictionary, but for layers. It's great when you need multiple outputs for different tasks and you want to pick one by name. And just like with module list, everything inside a module dict is fully registered and trainable. And that's the core of dynamic model creation. You write the rules and PyTorch builds exactly the model that you need. So those are the four techniques that we've covered. Multiple inputs and outputs, parameter sharing, conditional execution and dynamic model creation that can highlight the flexibility that PyTorch will give you. Now they might sound like special features, but really they're just natural patterns that come from PyTorch's dynamic graph and Python's flexibility. There's no magic. You're just writing normal Python and PyTorch makes it all work. So now in the rest of this module, you're gonna put these ideas into practice. You'll explore three classic architectures, each solving real problems in creative ways. And by the end of this module, you won't just understand these architectures, you'll be ready to design your own. So let's see how they work in practice.