All Courses/
Short Course/
Document AI: From OCR to Agentic Doc Extraction

Short CourseIntermediate3h1m

Document AI: From OCR to Agentic Doc Extraction

Instructors: David Park, Andrea Kropp

Earn an accomplishment with PRO

Start Learning

All Courses/
Short Course/
Document AI: From OCR to Agentic Doc Extraction

Document AI: From OCR to Agentic Doc Extraction

Intermediate
3h1m
15 Video Lessons
6 Code Examples
1 Graded AssignmentPRO
Earn an accomplishment withPRO
Instructors: David Park, Andrea Kropp
LandingAI
Learn more about Membership PRO Plan

Start Learning

What you'll learn

Build agentic document processing pipelines that convert PDFs into structured Markdown and JSON by extracting text, tables, charts, and forms without losing context from layout.
Explore LandingAI’s Agentic Document Extraction (ADE) framework to parse complex files reliably with visual grounding and extract fields accurately through user-defined schemas.
Learn to deploy serverless RAG applications on AWS with event-driven document processing powered by LandingAI’s ADE framework.

About this course

Join this new short course on Document AI, built with LandingAI and taught by David Park, Senior Director of Applied AI, and Andrea Kropp, Applied AI Engineer at LandingAI.

Much of the world’s data is locked in PDFs, JPEGs, and other documents. Traditional OCR extracts text but loses critical information—the layout of tables with merged cells, the relationship between charts and captions, the reading order of multi-column layouts. This course shows you how to build agentic workflows that process documents the way humans do: breaking them into parts, examining each piece carefully, and extracting information through multiple iterations.

You’ll start by exploring traditional OCR. After understanding its limitations, you’ll build agents equipped with additional tools for document processing like layout detection, reading order, and multimodal reasoning models. Next, you’ll learn to use the Agentic Document Extraction (ADE) framework from LandingAI to automate this workflow. ADE treats documents as visual objects. It uses custom models to parse complex elements and ground extracted fields to precise locations on the page. You’ll integrate ADE into RAG applications and deploy them as production-ready pipelines on AWS.

In detail, you’ll:

Explore traditional OCR methods to extract text from documents. Understand their limitations when handling tables, handwriting, or scanned images.
Learn how OCR evolved from early shape-based classifiers to modern deep learning systems. Apply models for layout detection to chunk documents into regions of interest with bounding boxes.
Use models for reading order to sort information into logical sequences. Employ vision language models to capture both text and images from documents.
Process challenging features like attestations, formulas, or barcodes with LandingAI’s Agentic Document Extraction (ADE). Parse documents as Markdown and extract key-value pairs as JSON without losing context in layout.
Build a RAG application that preprocesses documents with ADE, stores parsed chunks in a vector database, and retrieves text for queries along with cropped images from the respective source documents.
Implement an event-driven pipeline on AWS that automatically triggers ADE on uploads to S3, loads parsed documents into a Bedrock Knowledge Base, and interrogates them with Strands Agents.

Document AI is transforming how organizations unlock value from unstructured data. Whether you’re handling financial invoices, medical records, or academic papers, this course gives you the tools and techniques to build systems for intelligent document processing.

Who should join?

AI builders and developers who want to automate the extraction of information from documents. Basic familiarity with Python is recommended to make the most of this course.

Course Outline

15 Lessons・6 Code Examples

Introduction
Video
・
3m

Document Processing Basics
Video
・
8m

Lab 1: Document Processing with OCR
Video with Code Example
・
13m

Four Decades of OCR Evolution
Video
・
8m

Lab 2: Document Processing with PaddleOCR
Video with Code Example
・
18m

Layout Detection and Reading Order
Video
・
12m

Lab 3: Building Agentic Document Understanding
Video with Code Example
・
14m

A Single API for Agentic Document Understanding
Video
・
6m

Lab 4: Document Understanding with Agentic Document Extraction
Video with Code Example
・
15m

Lab 4: Document Understanding with Agentic Document Extraction II
Video with Code Example
・
12m

Agentic Document Extraction for RAG
Video
・
14m

Lab 5: Agentic Document Extraction for RAG
Video with Code Example
・
9m

Building RAG Pipelines with Agentic Document Extraction on AWS
Video
・
14m

Lab 6: Building a Research Paper Chatbot with Strands Agents
Video
・
16m

Conclusion
Video
・
1m

Quiz

Graded・Quiz

・

10m

Links & Resources
Reading
・
10m

Instructors

David Park

Senior Director of Applied AI at LandingAI

Andrea Kropp

Applied AI Engineer at LandingAI

Document AI: From OCR to Agentic Doc Extraction

Intermediate
3h1m
15 Video Lessons
6 Code Examples
1 Graded AssignmentPRO
Earn an accomplishment withPRO
Instructors: David Park, Andrea Kropp
LandingAI
Learn more about Membership PRO Plan

Start Learning

Additional learning features, such as quizzes and projects, are included with DeepLearning.AI Pro. Explore it today

Enroll Now

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!

Start Learning