DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

💻   Accessing Utils File and Helper Functions

In each notebook on the top menu:

1:   Click on "File"

2:   Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


💻   Downloading Notebooks

In each notebook on the top menu:

1:   Click on "File"

2:   Then, click on "Download as"

3:   Then, click on "Notebook (.ipynb)"


💻   Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


📗   See Your Progress

Once you enroll in this course—or any other short course on the DeepLearning.AI platform—and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


📱   Features to Use

🎞   Adjust Video Speed: Click on the gear icon (⚙) on the video and then from the Speed option, choose your desired video speed.

🗣   Captions (English and Spanish): Click on the gear icon (⚙) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

🔅   Video Quality: If you do not have access to high-speed internet, click on the gear icon (⚙) on the video and then from Quality, choose the quality that works the best for your Internet speed.

🖥   Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

√   Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


🧑   Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

🧑   Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

📅   Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

☕   Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

💬   Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

✍   Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


📚   Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. 👇

👉👉 🔗 DeepLearning.AI – All Short Courses [+]


🙂   Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community 👉👉 🔗 DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. You’ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to this short course "Quantization in Depth," built in partnership with Hugging Face. In this course you deep dive into the core technical building blocks of quantization, which is a key part of the AI software stack for compressing large language models and other models. You implement from scratch the most common variants of linear quantization, called asymmetric and symmetric modes, which relate to whether compression algorithm maps zero in the original representation, to zero in decompress representation, or if is allowed to shift the location of that zero. You also implement different forms of quantization, such as a per tensor per channel, and per group quantization using PyTorch, in which you can decide how big a chunk of your model you want to quantize at one time. You end up building a quantizer to quantize any model in eight-bit precision using per channel linear quantization. If some of the terms I use don't make sense yet, don't worry about it. These are all key technical concepts in quantization that you learn about in this course. And in addition to understanding all these quantization options, you also hone your intuition about when to apply which technique. I'm delighted to introduce our instructors for this course. Younes Belkada, a machine learning engineer at Hugging Face has been involved in the open source team, where he works at the intersection of many open source tools developed by Hugging Face such as transformers, PETF, and TRL. And also Marc Sun, who's also a machine learning engineer at Hugging Face. Marc is part of the Open source team, where he contributes to libraries such as transformers or Accelerate. Marc and Younes are also deeply involved in quantization in order to make large models accessible to the community. Thanks, Andrew. We are excited to work with you and your team on this. In this course, you will directly try your hand on implementing from scratch different variants of linear quantization, symmetric and asymmetric mode. You will also implement different quantization granularities, such as per tensor, per channel and per group quantization in pure PyTorch. Each one of these algorithms having their own advantages and drawbacks. After that, you'll build your own quantizer in order to quantize any model in eight-bit precision. Using the per channel quantization scheme that you have seen right before. You will see that you'll be able to apply this method to any model regardless of its modality, meaning you can apply to a text, vision, audio, or even a multimodal model. Once you are happy with the quantizer, it will try your hands on addressing common challenges in quantization. At the time, we speak the most common way of storing low-bit precision weights, such as four-bit or two-bit, seemed to be weight spiking. With weight spiking, you can pack altogether 2 or 4 bits tensors in a larger eight-bit tensor without allocating any extra memory. We will see together why this is important, and you will implement from scratch packing and unpacking algorithms. Finally, we will learn together about other challenges when it comes to quantizing large models such as LLMS. We will review together current state of the art approaches in order to perform no performance degradation quantization on LLMs and go through how to do that within the Hugging Face ecosystem. Quantization is a really important part of practical use of large models today. So having in-depth knowledge of it will help you to build, deploy, and use models more effectively. Many people have worked to create this course. I like to thank on the Hugging Face side, the entire Hugging Face team for the review of this course content, as well as the Hugging Face community for their contributions to open source models and quantization methods. From DeepLearning.AI, Eddy Shyu, had also contributed to this course. Quantization is a fairly technical topic. After this course, I hope you deeply understand it so you better say to others, "I now get it. I'm not worried about model compression." In other words, you can say: "I'm not sweating the small stuff." Let's go on to the next video and get started.
course detail
Next Lesson
Week 1: Quantization in Depth
  • Introduction
    Video
    ・
    4 mins
  • Overview
    Video
    ・
    3 mins
  • Quantize and De-quantize a Tensor
    Video with Code Example
    ・
    11 mins
  • Get the Scale and Zero Point
    Video with Code Example
    ・
    12 mins
  • Symmetric vs Asymmetric Mode
    Video with Code Example
    ・
    7 mins
  • Finer Granularity for more Precision
    Video with Code Example
    ・
    2 mins
  • Per Channel Quantization
    Video with Code Example
    ・
    11 mins
  • Per Group Quantization
    Video with Code Example
    ・
    7 mins
  • Quantizing Weights & Activations for Inference
    Video with Code Example
    ・
    3 mins
  • Custom Build an 8-Bit Quantizer
    Video with Code Example
    ・
    13 mins
  • Replace PyTorch layers with Quantized Layers
    Video with Code Example
    ・
    5 mins
  • Quantize any Open Source PyTorch Model
    Video with Code Example
    ・
    8 mins
  • Load your Quantized Weights from HuggingFace Hub
    Video with Code Example
    ・
    7 mins
  • Weights Packing
    Video
    ・
    5 mins
  • Packing 2-bit Weights
    Video with Code Example
    ・
    8 mins
  • Unpacking 2-Bit Weights
    Video with Code Example
    ・
    8 mins
  • Beyond Linear Quantization
    Video
    ・
    7 mins
  • Conclusion
    Video
    ・
    1 min
  • Course Feedback
  • Community