DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

๐Ÿ’ป ย  Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


๐Ÿ’ป ย  Downloading Notebooks

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Download as"

3: ย  Then, click on "Notebook (.ipynb)"


๐Ÿ’ป ย  Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


๐Ÿ“— ย  See Your Progress

Once you enroll in this courseโ€”or any other short course on the DeepLearning.AI platformโ€”and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


๐Ÿ“ฑ ย  Features to Use

๐ŸŽž ย  Adjust Video Speed: Click on the gear icon (โš™) on the video and then from the Speed option, choose your desired video speed.

๐Ÿ—ฃ ย  Captions (English and Spanish): Click on the gear icon (โš™) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

๐Ÿ”… ย  Video Quality: If you do not have access to high-speed internet, click on the gear icon (โš™) on the video and then from Quality, choose the quality that works the best for your Internet speed.

๐Ÿ–ฅ ย  Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

โˆš ย  Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


๐Ÿง‘ ย  Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

๐Ÿง‘ ย  Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

๐Ÿ“… ย  Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

โ˜• ย  Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

๐Ÿ’ฌ ย  Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

โœ ย  Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


๐Ÿ“š ย  Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. ๐Ÿ‘‡

๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI โ€“ All Short Courses [+]


๐Ÿ™‚ ย  Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. Youโ€™ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Post-training of LLMs taught by Banghua Zhu, who is Assistant Professor at the University of Washington, as well as co-founder of NexusFlow. Banghua has trained and post-trained in many models and I'm delighted that he is the instructor for this class. Thanks Andrew. I'm excited to be here. Training a large language model has two phases. Pre-training, where a model learns to predict the next word or token for the compute and cost point of view this is the bulk of training and may require training on trillions or tens of trillions of tokens of text. For very large models, this could take months. Then in Post-training, this is where the model is further trained to perform more specific tasks, such as answering questions. This phase usually uses much smaller datasets and is also much faster and cheaper. In this course, you'll learn about three common new ways to post-train and customize LLMs, and in fact, you're going to download the pre-trained model and post-train it yourself in a relatively computationally affordable way. You learn about the techniques, Supervised Fine-Tuning or SFT and direct preference optimization also called DPO, and online reinforcement learning. Supervised fine-tuning trains a model on labeled prompt-response pairs and hopes that learn to follow instructions or use tools by replicating that input prompt into our response relationship. Supervised fine-tuning is especially effective for introducing new behaviors or making major changes to the model. In one of the lessons, you fine-tune a small Qwen model to follow instructions. Direct Preference Optimization or DPO teaches a model by showing it both good and bad answers. DPO gives the model two options for the same prompt, one preferred over the other. DPO, through a constructive loss pushes a model closer to good and away from bad responses. For example, if the model says I'm your assistant, but you want it to say I'm your AI assistant, your label, I'm your assistant as bad and I'm your AI assistant as a good response. You will use DPO on a small Qwen instruct model to change its identity. With online reinforcement learning, the third of the three techniques, you give the LLM prompts it then generates responses and then a reward function scores the quality of the answers. The model then gets updated based on these reward scores. One way to get a reward model to give reward scores is to start with human judgments of the quality of responses. Then you can train a function to assign scores to the responses in a way that's consistent with the human judgments. The most common algorithm for this is probably proximal policy optimization, or PPO. Another way to come up with rewards is via verifiable rewards, which applies to tasks of objective correctness measures like math or coding. You can use math for checkers or for coding use unit tests to measure in an objective way. If generated math solutions or code is actually correct. This measure of correctness then gives you the reward function. A powerful algorithm for using these reward functions is GRPO or Group Relative Policy Optimization, which is introduced by DeepSeek. In this course, you use GRPO to train a small Qwen model to solve math problems. Many people have helped in creating this course. I'd like to thank Oleksii Kuchaiev from Nvidia and Jiantao Jiao from UC Berkeley. From DeepLearning.AI Esmaeil Gargari also contributed to this course. The first lesson will be an overview of post-training methods. In this lesson, you learned when you should do post-training In this lesson, you learned when you should do post-training as well as what is the menu of post training options you can choose from. Let's go on to the next video to get started.
course detail
Next Lesson
Post-training of LLMs
  • Introduction
    Video
    ใƒป
    3 mins
  • Introduction to Post-training
    Video
    ใƒป
    9 mins
  • Basics of SFT
    Video
    ใƒป
    8 mins
  • SFT in Practice
    Video with Code Example
    ใƒป
    13 mins
  • Basics of DPO
    Video
    ใƒป
    7 mins
  • DPO in Practice
    Video with Code Example
    ใƒป
    9 mins
  • Basics of Online RL
    Video
    ใƒป
    11 mins
  • Online RL in Practice
    Video with Code Example
    ใƒป
    11 mins
  • Conclusion
    Video
    ใƒป
    2 mins
  • Course Feedback
  • Community