Try out different variants of Linear Quantization, including symmetric vs. asymmetric mode, and different granularities like per tensor, per channel, and per group quantization.
We'd like to know you better so we can create more relevant courses. What do you do for work?
Instructors: Marc Sun, Younes Belkada
Try out different variants of Linear Quantization, including symmetric vs. asymmetric mode, and different granularities like per tensor, per channel, and per group quantization.
Build a general-purpose quantizer in Pytorch that can quantize the dense layers of any open source model for up to 4x compression on dense layers.
Implement weights packing to pack four 2-bit weights into a single 8-bit integer.
In Quantization in Depth you will build model quantization methods to shrink model weights to ¼ their original size, and apply methods to maintain the compressed model’s performance. Your ability to quantize your models can make them more accessible, and also faster at inference time.
Implement and customize linear quantization from scratch so that you can study the tradeoff between space and performance, and then build a general-purpose quantizer in PyTorch that can quantize any open source model. You’ll implement techniques to compress model weights from 32 bits to 8 bits and even 2 bits.
Join this course to:
Quantization in Depth lets you build and customize your own linear quantizer from scratch, going beyond standard open source libraries such as PyTorch and Quanto, which are covered in the short course Quantization Fundamentals, also by Hugging Face.
This course gives you the foundation to study more advanced quantization methods, some of which are recommended at the end of the course.
Building on the concepts introduced in Quantization Fundamentals with Hugging Face, this course will help deepen your understanding of linear quantization methods. If you’re looking to go further into quantization, this course is the perfect next step.
Introduction
Overview
Quantize and De-quantize a Tensor
Get the Scale and Zero Point
Symmetric vs Asymmetric Mode
Finer Granularity for more Precision
Per Channel Quantization
Per Group Quantization
Quantizing Weights & Activations for Inference
Custom Build an 8-Bit Quantizer
Replace PyTorch layers with Quantized Layers
Quantize any Open Source PyTorch Model
Load your Quantized Weights from HuggingFace Hub
Weights Packing
Packing 2-bit Weights
Unpacking 2-Bit Weights
Beyond Linear Quantization
Conclusion
Course access is free for a limited time during the DeepLearning.AI learning platform beta!
Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!