
100% Completed
Introduction
The Main Ideas Behind Transformers and Attention
The Matrix Math for Calculating Self-Attention
Coding Self-Attention in PyTorch
Self-Attention vs Masked Self-Attention
The Matrix Math for Calculating Masked Self-Attention
Coding Masked Self-Attention in PyTorch
Encoder-Decoder Attention
Multi-Head Attention
Coding Encoder-Decoder Attention and Multi-Head Attention in PyTorch
Conclusion
Appendix – Tips and Help