DeepLearning.AI
AI is the new electricity and will transform and improve nearly all areas of human lives.

๐Ÿ’ป ย  Accessing Utils File and Helper Functions

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Open"

You will be able to see all the notebook files for the lesson, including any helper functions used in the notebook on the left sidebar. See the following image for the steps above.


๐Ÿ’ป ย  Downloading Notebooks

In each notebook on the top menu:

1: ย  Click on "File"

2: ย  Then, click on "Download as"

3: ย  Then, click on "Notebook (.ipynb)"


๐Ÿ’ป ย  Uploading Your Files

After following the steps shown in the previous section ("File" => "Open"), then click on "Upload" button to upload your files.


๐Ÿ“— ย  See Your Progress

Once you enroll in this courseโ€”or any other short course on the DeepLearning.AI platformโ€”and open it, you can click on 'My Learning' at the top right corner of the desktop view. There, you will be able to see all the short courses you have enrolled in and your progress in each one.

Additionally, your progress in each short course is displayed at the bottom-left corner of the learning page for each course (desktop view).


๐Ÿ“ฑ ย  Features to Use

๐ŸŽž ย  Adjust Video Speed: Click on the gear icon (โš™) on the video and then from the Speed option, choose your desired video speed.

๐Ÿ—ฃ ย  Captions (English and Spanish): Click on the gear icon (โš™) on the video and then from the Captions option, choose to see the captions either in English or Spanish.

๐Ÿ”… ย  Video Quality: If you do not have access to high-speed internet, click on the gear icon (โš™) on the video and then from Quality, choose the quality that works the best for your Internet speed.

๐Ÿ–ฅ ย  Picture in Picture (PiP): This feature allows you to continue watching the video when you switch to another browser tab or window. Click on the small rectangle shape on the video to go to PiP mode.

โˆš ย  Hide and Unhide Lesson Navigation Menu: If you do not have a large screen, you may click on the small hamburger icon beside the title of the course to hide the left-side navigation menu. You can then unhide it by clicking on the same icon again.


๐Ÿง‘ ย  Efficient Learning Tips

The following tips can help you have an efficient learning experience with this short course and other courses.

๐Ÿง‘ ย  Create a Dedicated Study Space: Establish a quiet, organized workspace free from distractions. A dedicated learning environment can significantly improve concentration and overall learning efficiency.

๐Ÿ“… ย  Develop a Consistent Learning Schedule: Consistency is key to learning. Set out specific times in your day for study and make it a routine. Consistent study times help build a habit and improve information retention.

Tip: Set a recurring event and reminder in your calendar, with clear action items, to get regular notifications about your study plans and goals.

โ˜• ย  Take Regular Breaks: Include short breaks in your study sessions. The Pomodoro Technique, which involves studying for 25 minutes followed by a 5-minute break, can be particularly effective.

๐Ÿ’ฌ ย  Engage with the Community: Participate in forums, discussions, and group activities. Engaging with peers can provide additional insights, create a sense of community, and make learning more enjoyable.

โœ ย  Practice Active Learning: Don't just read or run notebooks or watch the material. Engage actively by taking notes, summarizing what you learn, teaching the concept to someone else, or applying the knowledge in your practical projects.


๐Ÿ“š ย  Enroll in Other Short Courses

Keep learning by enrolling in other short courses. We add new short courses regularly. Visit DeepLearning.AI Short Courses page to see our latest courses and begin learning new topics. ๐Ÿ‘‡

๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI โ€“ All Short Courses [+]


๐Ÿ™‚ ย  Let Us Know What You Think

Your feedback helps us know what you liked and didn't like about the course. We read all your feedback and use them to improve this course and future courses. Please submit your feedback by clicking on "Course Feedback" option at the bottom of the lessons list menu (desktop view).

Also, you are more than welcome to join our community ๐Ÿ‘‰๐Ÿ‘‰ ๐Ÿ”— DeepLearning.AI Forum


Sign in

Create Your Account

Or, sign up with your email
Email Address

Already have an account? Sign in here!

By signing up, you agree to our Terms Of Use and Privacy Policy

Choose Your Learning Path

Enjoy 30% Off Now. Cancel Anytime!

MonthlyYearly

Change Your Plan

Your subscription plan will change at the end of your current billing period. Youโ€™ll continue to have access to your current plan until then.

View All Plans and Features

Welcome back!

Hi ,

We'd like to know you better so we can create more relevant courses. What do you do for work?

DeepLearning.AI
  • Explore Courses
  • Community
    • Forum
    • Events
    • Ambassadors
    • Ambassador Spotlight
  • My Learnings
  • daily streak fire

    You've achieved today's streak!

    Complete one lesson every day to keep the streak going.

    Su

    Mo

    Tu

    We

    Th

    Fr

    Sa

    free pass got

    You earned a Free Pass!

    Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

    Free PassFree PassFree Pass
Welcome to pre-processing unstructured data for LLM applications both Retrieval Augmented Generation or RAG has been widely adopted in many enterprises. The typical RAG pipeline has key components like data loading, chunking, embedding, storing in the vector database, and then retrieval. In this course, you'll learn techniques for representing all sorts of unstructured data, like text, images, and tables from many different sources, like PDF and PowerPoint and Word, in a way that lets your LLM RAG pipeline access all of this information. A particularly challenging task in RAG is data loading and chunking due to data being stored in many different file types and data formats. A particularly challenging task in RAG is data loading and chunking due to data being stored in many different file types and data formats. For example, you may have numeric data in Excel spreadsheets, or text reports in PDF or Markdown, or presentations in PowerPoint or Slides or Keynotes, or communications in Outlook or Slack or Teams and so on. Each of these file types also in turn might support data stored inside them in different formats. A PDF or PowerPoint file, for example, may itself contain tables, images, or bulleted lists. So a data loader must first be able to parse many different file formats. But once it's parsed that data, then what? It turns out that it's very useful to normalize the data from these different sources. So when you normalize tables from, say, within a PDF or a PowerPoint or other format, it can all be represented in a similar way. Or maybe a bulleted list, whether from a PDF or from email can also be represented in a similar way. It is also useful in addition to maintain some sort of structure of the original documents by preserving that structured information in metadata. For example, maybe recording that a paragraph has a parent, which is the title of the chapter. A query that matches that chapter can be expanded to return child text as well with your data organized in this example in say a tree hierarchical structure. With us to explain how all this is done is Matt Robinson, who's head of product at Unstructured. Matt's team has been responsible for Un unstructured tools for ingesting data for LLMs to use, and he's helped many developers build LLM applications that use and combine data from diverse sources. Thanks, Andrew. I'm excited to work with you and your team on this. This course tackles the critical yet often overlooked aspect of LLM app development, data pre-processing. You'll learn how to extract and normalize content from a wide variety of document types, including PDFs, PowerPoints, Word, and HTML, enabling your LLM to access a broad range of information. You'll also learn how to enrich this content with metadata, enhancing RAG results, and supporting nuanced search capabilities. This course covers document image analysis techniques like, Finally, you'll apply these techniques to build a RAGbot using documents like PDFs, Many people have worked to create this course. I'd like to thank, from Unstructured, Brian Raymond and Ronny Hoesada there, In the first lesson, you'll learn how you can extract and normalize content from a diverse range of document types, so your LLM can reference information from PDFs, PowerPoints, Word docs, HTML, and more. Data engineering is a key aspect of getting the context you need to your LLM to let them do well on your application. I hope you enjoy learning these leading-edge techniques, which I think you'll find useful for building many applications. Let's go on to the next video and get started.
course detail
Next Lesson
Preprocessing Unstructured Data for LLM Applications
  • Introduction
    Video
    ใƒป
    4 mins
  • Overview of LLM Data Preprocessing
    Video
    ใƒป
    3 mins
  • Normalizing the Content
    Video with Code Example
    ใƒป
    14 mins
  • Metadata Extraction and Chunking
    Video with Code Example
    ใƒป
    21 mins
  • Preprocessing PDFs and Images
    Video with Code Example
    ใƒป
    10 mins
  • Extracting Tables
    Video with Code Example
    ใƒป
    8 mins
  • Build Your Own RAG Bot
    Video with Code Example
    ใƒป
    9 mins
  • Conclusion
    Video
    ใƒป
    1 min
  • Course Feedback
  • Community