DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

๐Ÿ’ป ย  Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


๐Ÿ’ป ย  Downloading Notebooks

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Download as"

3: ย  Then, click on "Notebook (.ipynb)"


๐Ÿ’ป ย  Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


๐Ÿ“— ย  See Your Progress

Once you enroll in this courseโ€”or any other short course on the DeepLearning.AI platformโ€”and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


๐Ÿ“ฑ ย  Features to Use

๐ŸŽž ย  Adjust Video Speed: Click on the gear icon (โš™) on the video and then from the Speed option, choose your desired video speed.

๐Ÿ—ฃ ย  Captions (English and Spanish): Click on the gear icon (โš™) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

๐Ÿ”… ย  Video Quality: If you do not have access to high-speed internet, click on the gear icon (โš™) on the video and then from Quality, choose the quality that works the best for your Internet speed.

๐Ÿ–ฅ ย  Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

โˆš ย  Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


๐Ÿง‘ ย  Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

๐Ÿง‘ ย  Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

๐Ÿ“… ย  Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

โ˜• ย  Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

๐Ÿ’ฌ ย  Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

โœ ย  Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


๐Ÿ“š ย  Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. ๐Ÿ‘‡

๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI โ€“ All Short Courses [+]


๐Ÿ™‚ ย  Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. Youโ€™ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Reinforcement Fine-Tuning LLMs with GPRO, built in partnership with Predibase. In this course, you take a deep technical dive into reinforcement fine-tuning, or RFT, which is training techniques that uses reinforcement learning to improve the performance of LLMs on tasks that require multi-step reasoning, say to complete tasks like math or code generation. By harnessing an LLM's ability to reason through problems to think step by step, reinforcement fine-tuning guides the model to discover solutions to complex tasks on his own, rather than relying on preexisting examples as in traditional supervised learning. This approach lets you adapt models to complex tasks with much less training data. Say just a couple dozen examples than you typically need for successful supervised fine-tuning. I'm delighted to introduce your instructors for this course. Travis Addair is co-founder and CTO at Predibase and Arnav Garg is senior Machine Learning Engineer and Machine Learning Lead at the company. Both have worked closely with many customers to solve practical business problems using RFT. Thanks, Andrew. We're excited to be here. In this course, you'll explore how RFT works using a fun example training a small LLM to play Wordle, a popular word puzzle game in which the player has to guess a five-letter word in six tries or fewer. You'll start by prompting the Qwen-2.5-7B model to play the game. Analyze this performance and develop a reward function that can be used to help the model learn how to do better over time. This reward function is the key component of Group Relative Policy Optimization, or GRPO. So the learning algorithm developed by DeepSeek to carry out reinforcement learning of reasoning tasks. In GRPO, an LLM produces multiple responses to a single prompt that are then scored using a reward function based on verifiable metrics like correct formatting or functioning code. This use of reward function is a key difference between GRPO and other RL algorithms. If you've heard of RL algorithms like PPO or DPO, they rely on human feedback or complex multi-model systems to assign rewards. After developing a reward function for the Wordle example, you'll learn some other general principles for writing good reward functions that you can apply to a wide range of problems. You'll also explore ways to avoid reward hacking, which is where a model learns behaviors that maximize rewards without actually solving the problem at hand. Next, you'll take a close look at the technical details of how loss is calculated during RFT. You'll see how the seemingly complex process of the GRPO algorithm, like clipping and KL divergence and the loss function, are actually simpler than you might think once you implement them in code. Finally, you'll wrap up the course by seeing how you can carry out RFT using the Predibase API with your own data and your own custom reward functions. Many people have worked to develop this course from Predibase, I'd like to thank Michael Ortega and from DeepLearning.AI, Tommy Nelson. LLMs that can reason well are critical components of many agentic systems, and RFT will let smaller models work well in agentic workflows. There's a lot of excitement around this capability of LLMs, and RL itself is, I think, a very powerful and important technique that is still very mysterious to many people. So this is a great time to learn how RL works and how to use it to tune your own custom reasoning models. I think you fine learning these things are really rewarding. Let's go to the next video where you learn what are the major differences between RFT and supervised fine-tuning.
course detail
Next Lesson
Reinforcement Fine-Tuning LLMs With GRPO
  • Introduction
    Video
    ใƒป
    3 mins
  • Introduction to reinforcement learning
    Video
    ใƒป
    7 mins
  • Benefits of reinforcement finetuning
    Video
    ใƒป
    4 mins
  • Can a large language model master Wordle
    Video with Code Example
    ใƒป
    10 mins
  • Reward functions
    Video with Code Example
    ใƒป
    10 mins
  • Reward functions with LLM as a judge
    Video with Code Example
    ใƒป
    12 mins
  • Reward hacking
    Video with Code Example
    ใƒป
    7 mins
  • Calculating loss in GRPO
    Video with Code Example
    ใƒป
    18 mins
  • Putting it all together: Training Wordle
    Video with Code Example
    ใƒป
    8 mins
  • Conclusion
    Video
    ใƒป
    1 min
  • Appendix โ€“ Tips, Help, and Download
    Code Example
    ใƒป
    10 mins
  • Course Feedback
  • Community