DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

๐Ÿ’ป ย  Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


๐Ÿ’ป ย  Downloading Notebooks

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Download as"

3: ย  Then, click on "Notebook (.ipynb)"


๐Ÿ’ป ย  Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


๐Ÿ“— ย  See Your Progress

Once you enroll in this courseโ€”or any other short course on the DeepLearning.AI platformโ€”and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


๐Ÿ“ฑ ย  Features to Use

๐ŸŽž ย  Adjust Video Speed: Click on the gear icon (โš™) on the video and then from the Speed option, choose your desired video speed.

๐Ÿ—ฃ ย  Captions (English and Spanish): Click on the gear icon (โš™) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

๐Ÿ”… ย  Video Quality: If you do not have access to high-speed internet, click on the gear icon (โš™) on the video and then from Quality, choose the quality that works the best for your Internet speed.

๐Ÿ–ฅ ย  Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

โˆš ย  Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


๐Ÿง‘ ย  Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

๐Ÿง‘ ย  Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

๐Ÿ“… ย  Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

โ˜• ย  Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

๐Ÿ’ฌ ย  Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

โœ ย  Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


๐Ÿ“š ย  Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. ๐Ÿ‘‡

๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI โ€“ All Short Courses [+]


๐Ÿ™‚ ย  Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. Youโ€™ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Reinforcement Learning from Human Feedback, or RLHF, built in partnership with Google Cloud. An LLM trained from public internet data would mirror the tone of the internet, so it can generate information that is harmful, false, or unhelpful. RLHF is an important tuning technique that has been critical to align an LLM's output with human preferences and values. This algorithm is, I think, a big deal and has been a central part to the rise of LLMs. And it turns out that ROHF can be useful to you, even if you're not training an LLM from scratch, but instead building an application whose values you want to set. While fine-tuning could be one way to do this, as you learn in this course, for many cases, RLHF can be more efficient. For example, there are many valid ways in which an LLM can respond to a prompt such as, what is the capital of France? It could reply with, Paris is the capital of France, or it could even more simply reply, Paris. Some of these responses were few more natural than others. And so, RROHF is a method for gathering human feedback on which responses they prefer in order to train the model to generate more responses that humans prefer. In this process, you start off with an LLM that's already been trained with instruction tuning, so it's already learned to follow instructions. You then gather a dataset that indicates a human label's preferences between multiple completions of the same prompt, and use this dataset as a reward signal, or to create a reward signal, to fine-tune an instruction an instruction tuned LLM. The result is a tuned large language model that generates completions or outputs that better aligns with the preferences of the human labelers. I am delighted to introduce the instructor, Nikita Namjishi, who is developer advocate for Gent of AI on Google Cloud. She is a regular speaker at Gen2AI developer events and has helped many people build Gen2AI applications. I look forward to her sharing her deep experience, her deep practical experience with Gen2AI and with ROHF with us here. Thank you, Andrew. I'm really excited to work with you and your team on this. In this course, you learn about the RLHF process and also gain hands-on practice exploring sample data sets for RLHF, tuning the LLAMA2 model using RLHF, and then also evaluating the newly tuned model. Nikita will go through these concepts using Google Cloud's Machine Learning Platform, Vertex AI. What really excites me about RLHF is that it helps us to improve an LLM's ability to solve tasks where the desired output is difficult to explain or describe. In other words, problems where there's no single correct answer. And in a lot of problems we naturally want to use LLMs for, there really is no one correct answer. It's such an interesting way of thinking about training machine learning models, and it's different from supervised fine-tuning, which you may already be familiar with. RLHF doesn't solve all of the problems of truthfulness and toxicity in large language models, but it's really been a key part of improving the quality of these models. And I think we're going to continue to see more techniques like this in the future as the field evolves. So, I'm really, really excited to share with you just how it works. And I'm happy to say you don't need to know any reinforcement learning to get started. Many people have worked to create this course. I'd like to thank, on the Google Cloud side, Bethany Wang, Mei Hu, and Jarek Kazmierczak. From DeepLighting.ai, Eddie Xu and Leslie Zerma had also contributed to this course. So with that, let's go on to the next video where Nikita will present an overview of RHF so you can see all the pieces of how it works and how they fit together. Let's go on to the next video.
course detail
Next Lesson
Reinforcement Learning From Human Feedback
  • Introduction
    Video
    ใƒป
    4 mins
  • How does RLHF work
    Video
    ใƒป
    11 mins
  • Datasets for RL training
    Video with Code Example
    ใƒป
    9 mins
  • Tune an LLM with RLHF
    Video with Code Example
    ใƒป
    24 mins
  • Evaluate the tuned model
    Video with Code Example
    ใƒป
    22 mins
  • Google Cloud Setup
    Code Example
    ใƒป
    10 mins
  • Conclusion
    Video
    ใƒป
    1 min
  • Course Feedback
  • Community