DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

Quick Guide & Tips

๐Ÿ’ป ย  Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


๐Ÿ’ป ย  Downloading Notebooks

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Download as"

3: ย  Then, click on "Notebook (.ipynb)"


๐Ÿ’ป ย  Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


๐Ÿ“— ย  See Your Progress

Once you enroll in this courseโ€”or any other short course on the DeepLearning.AI platformโ€”and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


๐Ÿ“ฑ ย  Features to Use

๐ŸŽž ย  Adjust Video Speed: Click on the gear icon (โš™) on the video and then from the Speed option, choose your desired video speed.

๐Ÿ—ฃ ย  Captions (English and Spanish): Click on the gear icon (โš™) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

๐Ÿ”… ย  Video Quality: If you do not have access to high-speed internet, click on the gear icon (โš™) on the video and then from Quality, choose the quality that works the best for your Internet speed.

๐Ÿ–ฅ ย  Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

โˆš ย  Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


๐Ÿง‘ ย  Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

๐Ÿง‘ ย  Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

๐Ÿ“… ย  Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

โ˜• ย  Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

๐Ÿ’ฌ ย  Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

โœ ย  Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


๐Ÿ“š ย  Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. ๐Ÿ‘‡

๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI โ€“ All Short Courses [+]


๐Ÿ™‚ ย  Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. Youโ€™ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to Pre-training LLMs, built in partnership with Upstage and taught by Upstage's CEO Sung Kim, as well as Chief Scientific Officer Lucy Park. Welcome, Sung, Lucy. Thank you. Andrew. We are so excited to be here. Pre-training a large language model is a process of taking a model, generally a transformer neural network, and training it on a large corpus of text using supervised learning, so that learns to repeatedly predict the next token given an input prompt. This process is called pre-training because it is the first step of training an LLM before any fine-tuning to have it follow instructions or further alignment to human preferences is carried out. The outputs of pre-training is known as a base model and will cover both training from scratch, meaning from randomly initialized weights, as well as taking a model that's already been pre-trained and continuing the pre-training process on your own data. Training a large model from scratch is computationally expensive, requiring multiple state-of-the-art GPUs and training runs that can last weeks or months. For this reason, most developers won't pre-train models from scratch, and when they take an existing model and either use prompting or sometimes fine-tuning to adapt it to their own tasks. However, there are still some situations where pre-training a model may be required or preferred, and that's what Upstage has been doing for its customers. That's right Andrew. Our customers are pre-training models for various reasons. Some are building models for tasks in specific domains like legal, healthcare, and e-commerce. Others need models with stronger abilities in specific languages such as Thai and Japanese. Further, new training methods are making more efficient pre-training possible, like Depth Upscaling, which uses two or more sets of existing models to build larger models. For example, we trained our solar model using Depth Upscaling. Because of this technology improvements, we are seeing more and more interest in pre-training. Depth Upscaling creates a new, larger model by duplicating layers of a smaller pre-trained model. The new model is then further pre-trained, resulting in a better, larger model than the original. Our team at Upstage has empirically found that models created in this way can be pre-trained with up to 70% less compute than traditional pre-training, representing a large cost saving. Whether pre-training is the right solution for your work depends on several factors, such as whether a model might already be available that might work for your task without pre-training, and what data you have available, as well as the compute resources you have access to, both for training and serving. And lastly, the privacy requirements you may have, which may also implicate regulatory compliance requirements. So depending on the company or sector you work in, you might find yourself being asked to pre-train a model or at least consider doing so at some point in your work. In this course, you'll learn all of the necessary steps to pre-train a model from scratch, from gathering and preparing training data, to configuring a model and training it. You'll start by looking at some use cases where pre-training a model is the best option to get good performance, and discuss the difference between pre-training and fine-tuning. Next, you'll walk through the data preparation steps that are required to pre-train a model. You'll explore how you can gather data from the internet or existing repositories like HuggingFace, and then look at the steps to obtain high quality training data, including deduplication, filtering on the length of text examples, and language cleaning. After that, you'll explore some options for configuring your models architecture. You'll see how you can modify Meta's Llama models to create larger or smaller models, and then look at a few options for initializing weights, either randomly or from other models. Lastly, you'll see how to train a model using the open source HuggingFace library and actually run a few steps of training to observe how the loss decreases as the training progresses. This course uses smaller models with just a few million parameters to keep things lightweight enough to run on a CPU, but you'll be able to use the code from the lessons to scale to both larger datasets and models, and also to train on GPUs. Thanks, Lucy. This sounds like it would be very helpful for building intuition about when pre-training makes sense, and what is required to carry it out. Many people have worked to create this course from Upstage, I'd like to thank Chanjun Park Sanghun Kim, Jerry Kim, Stan Lee, Yungi Kim, as well as their collaborator Ian Park. From DeepLearning.AI Tommy Nelson and Geoff Ladwig also contributed to this course. I'd like to reiterate that pre-training large models of large datasets is an expensive activity, with a minimum cost of maybe about $1,000 for the smallest models, up to tens of thousands of dollars to hundreds of thousands of dollars for maybe a billion parameter scale model. So to be careful, if you choose to try this out yourself, there are calculators like one from HuggingFace that you'll see in the course that can help you estimate the costs of your pre-training scenario before you get started. These can help you avoid unexpected large bills from your cloud provider. But pre-training is a key part of the LLM stack, and whether you just want to build your intuition about LLMs or continue their pre-training of an existing model, or even try to pre-train something from scratch to compete on the LLM leaderboards, I hope you enjoy this course. So, let's go on to the next video and get started.
course detail
Next Lesson
Pretraining LLMs
  • Introduction
    Video
    ใƒป
    6 mins
  • Why Pre-training
    Video with Code Example
    ใƒป
    12 mins
  • Data Preparation
    Video with Code Example
    ใƒป
    16 mins
  • Packaging Data for Pretraining
    Video with Code Example
    ใƒป
    8 mins
  • Model Initialization
    Video with Code Example
    ใƒป
    16 mins
  • Training in Action
    Video with Code Example
    ใƒป
    11 mins
  • Evaluation
    Video with Code Example
    ใƒป
    7 mins
  • Conclusion
    Video
    ใƒป
    1 min
  • Course Feedback
  • Community