Welcome to Module 3! In the previous module, you focused on working with visual data using TorchVision. You learned how to preprocess images, use pretrained models, and visualize predictions such as bounding boxes and segmentations. In this module, the focus will shift to a very different kind of data: text. Working with text data opens the door to some of the most impactful applications in artificial intelligence, from chatbots and virtual assistants, to search engines, recommendation systems, and advanced translation tools. When you work with images, data is represented as numerical values such as pixel intensities that are fed directly into neural networks. But with text, things are different. Text is made up of discrete symbols such as letters, words, and punctuation marks. These symbols do not carry intrinsic numeric meaning. So before a model can use them, you have to translate them into numerical representations. The common first step is to assign a unique number to each word. But once text is converted into numbers, it can still behave very differently from other types of data. Unlike tabular data, which has a fixed set of numerical features, or images, which are structured as consistent pixel arrays, text is highly variable. Two sentences might communicate similar ideas, but use entirely different words and structures. In fact, there's rarely one correct way to express an idea in natural language. There are infinite possibilities, and models must be able to handle the variations. Another key challenge is that text is inherently sequential and contextual. Words rarely exist in isolation. Their meaning is determined by the words around them and by the order in which they appear. Take the word bat. If I say 'a bat flew out of the cave,' you know I'm referring to the animal. But if I say 'he swung the baseball bat,' you know I'm referring to sports. This phenomenon, known as polysemy, shows how much meaning depends on context. Understanding context requires more advanced techniques that go beyond simple word lookup tables. And all of this means that NLP workflows require specific steps for processing text, specialized models designed to capture meaning and context, and large data sets and compute power to handle the complexity of language. Let's look more closely at some of the specific reasons why text data is so challenging to work with. First is sequence dependencies and context. Words depend on one another to convey meaning. The phrase 'baseball bat' has a very different meaning from 'a bat flew,' even though they share the same word. Understanding text is not just about understanding individual words. It's also about understanding how those words relate to each other in a sequence. Models that treat each word independently often fail to capture this nuance, which is why modern architectures like transformers are designed to handle contextual relationships in a sequence. Second is variable length and structure. Text comes in different lengths. You might have a single word query in a search engine, or you might have entire paragraphs, documents, or even books. That means that models need to be able to handle sequences of wildly different lengths. In practice, this requires techniques like padding and truncation, which I'll explain in this module. Third is vocabulary and representation challenges. All languages have a massive vocabulary, and it keeps evolving. People invent new words, adopt slang, or use specialized jargon depending on the domain, like medical terminology or legal phrasing. This creates the out-of-vocabulary problem where a model encounters words that it's never seen before, like a new product name or hashtag. Modern approaches like subword tokenization can help reduce this problem, but it's still a core challenge for natural language processing systems. Finally, ambiguity and polysemy. Words often have multiple meanings depending on context. A bat could be the flying mammal, a piece of sports equipment, or even a verb meaning 'to hit something.' On top of lexical ambiguity, there's also syntactic ambiguity, sentences that can be structured in multiple valid ways, and semantic ambiguity, where the overall meaning of a sentence depends on outside knowledge or context. Humans are really good at resolving these ambiguities effortlessly, but for machines, it's an incredibly difficult task that requires sophisticated language models and often large amounts of training data. Together, these challenges make natural language processing one of the most complex, but also one of the most exciting areas of machine learning. NLP is also one of the most practical fields. It's given us applications that impact our daily lives in ways that you may not even notice. Let's look at three areas where NLP plays that central role. First is classification tasks, where the goal is to assign labels to entire pieces of text. Think of sentiment analysis, which helps businesses instantly gauge how customers feel about their products based on reviews or social media posts, or spam detection, which filters out unwanted emails before they hit your inbox. Intent classification in chatbots enables a virtual assistant to understand whether you're asking about your account balance or requesting technical help. And there's also topic categorization, where news articles or documents are automatically sorted by subject. The second area is sequence labeling tasks. Instead of labeling a whole sentence, these systems label individual words or spans of text. A great example is named entity recognition, or NER, where models can identify names of people, organizations, locations, addresses, stuff like that. For instance, given a sentence like 'Alice works at deeplearning.ai in Mountain View,' an NER system can pick out 'Alice' as a person, 'DeepLearning.AI' as an organization, and 'Mountain View' as a location. This is critical for building search engines, virtual assistants, medical record analyzers, and countless other tools that rely on pulling structured information out of unstructured text. Other tasks, like part-of-speech tagging, where words are labeled as nouns, verbs, or adjectives, are foundational to everything, from grammar checkers to speech recognition. And then finally, there are generative tasks. These focus on producing new text from an input. This area includes text summarization, which condenses long reports or articles into a few key sentences. Dialogue generation powers chatbots and virtual assistants that can hold natural conversations. Machine translation has completely changed how we communicate across languages, enabling real-time translation in video conferences or travel apps. We even see models doing creative writing, generating poetry, scripts, or stories. These models can predict one word at a time, carefully building out entire sentences and paragraphs, which is why they can adapt to so many uses. Taken together, these areas show just how versatile NLP has become, powering everything from how you get information to how you communicate to how businesses can operate at scale. So how does PyTorch fit into all of this? Well, over the last decade, NLP has evolved dramatically. Early systems were based on hand-engineered features and rules, but today's NLP is driven by deep learning. Modern models learn to represent and reason language in ways that generalize to new inputs. PyTorch has emerged as one of the most popular frameworks for building these very models. With PyTorch, you can handle every stage of an NLP workflow. It provides utilities for preprocessing text, converting it into numerical tensors, and building models tailored for language tasks. PyTorch also makes it straightforward to experiment with different architectures, whether you're working with a simple embedding-based classifier or a large-scale transformer model. Historically, PyTorch had a dedicated library called TorchText to simplify NLP pipelines. TorchText provided ready-to-use datasets, tokenizers, and iterators for text processing. However, TorchText is no longer actively maintained. So the current approach is to use PyTorch's core capabilities directly, or to integrate with specialized tokenization libraries and pretrained model hubs like Hugging Face. This shift gives you more flexibility and ensures that your code is aligned with where the NLP ecosystem is headed. So whether you're building a sentiment analysis tool, designing a chatbot, or experimenting with text generation, PyTorch gives you a powerful foundation. In this module, you'll process text data, create embeddings, and train models using PyTorch. In the next videos, you're going to dive into those PyTorch tools, starting with one of the most fundamental tasks in NLP, tokenization, and that's turning raw text into a numerical form that deep learning models can understand.