Welcome to the final lab in this course. In this lab, you'll build a complete document intelligence pipeline with LandingAI ADE on AWS that combines automated document parsing with a conversational AI agent. Just a quick note, this lab assumes basic familiarity with AWS. However, if you're new to AWS, You can follow along with the explanations to understand how production-ready document pipelines are built. All lab files are available in the reading item that follows this lab. Before diving into the implementation, let's review the architecture we'll be building. You already seen this diagram in the previous video, but here's how the data flows through the system. First, a user uploads a PDF to the S3 bucket's input folder. Second, S3 automatically triggers the Lambda function when a new file arrives. Third, Lambda uses LandingAI ADE to parse the PDF into structured markdown. Fourth, the parsed markdown, visual grounding data, and individual chunks are saved to the S3 output folder. Fifth, Bedrock Knowledge Base indexes the documents for semantic search. And last but not least, users ask questions to the Strands agent, which has memory to maintain context. There are some prerequisites to the lab. This lab assumes there are an S3 bucket with input and output folders, and a Bedrock knowledge base connected to the output medical chunks folder of the S3 bucket. We'll include links on how to create an AWS account, set up these resources and learn the basics of AWS if you'd like to try the lab yourself. Let's start by installing the packages we'll need throughout the lab. boto3 is the official AWS SDK for Python. This lets you interact with AWS services programmatically, such as creating a Lambda function, uploading, downloading files from the S3 buckets from a notebook environment instead of the AWS console. python-dotenv loads secrets from a .env file. Pillow annotates PDF pages with highlights for visual grounding. PyMuPDF renders PDF pages as images, and for AWS agent services, bedrock-agentcore provides memory management for agents and strands-agents is a framework for building the AI agent. With the packages installed, let's load our environment variables from the .env file. This file contains sensitive credentials so we don't want to hardcode in our notebook. Here is an example .env file you can use as a template. Now that our environment is configured, we need to set up connections to the AWS services we'll be using. The boto3 library provides this capability through clients. Here's how it works. First, we create a boto3 session that manages the AWS credentials. From the session, we create a client for each AWS service we want to interact with. And we'll create clients for the following services. s3_client uploads PDFs, download outputs, and manage buckets. lambda_client deploys Lambda functions, update code, and configure triggers. IAM client creates roles with proper permissions for the Lambda function. logs client uses CloudWatch logs to monitor Lambda execution and debug. bedrock_agent_runtime client queries knowledge bases for document search, and bedrock_runtime client calls Claude models directly. With our AWS clients ready, we can now build the complete pipeline. Here's the roadmap for what we will accomplish. Part one, setting up the Lambda Function, steps three through five. We'll set up the Lambda Function in three steps. First, package the code by bundling the Python file and dependencies into a zip folder. Second, create a role by defining permissions that allow Lambda to access S3 for downloading and uploading files. Third, deploy the function by uploading the package to AWS Lambda. Part two, setting up the trigger. Step six. Next, we'll configure S3 to automatically invoke our Lambda function whenever new files are uploaded to the input folder. Part three, building the agent. Step 7 through 12. Finally, we'll upload medical research papers, ingest the parsed documents into the Bedrock Knowledge Base, and build an intelligent agent using Strands Agent. To keep this notebook focused on the core concepts, We've created helper functions in lambda_helpers.py. These functions handle the lower level AWS operations. I'll explain the logic of each helper function as we use them throughout the lab. Step three, creating the deployment package. So, what is a Lambda deployment package? To create an AWS Lambda function, you need to bundle your source code and all its dependencies into a zip file. This package contains everything Lambda needs to run your code. Your source code in ade_s3_handler.py contains the ADE parsing logic that executes when the Lambda function is invoked. and installed dependencies. All pip packages your source code imports. And here's what the package structure looks like. For creating the package, we'll use the create_deployment_package helper function to build this package. And it takes four parameters. The source_files, the Python files containing our code, requirements, the pip packages our code depends on, output_zip, the name for the output zip file. and the package directory, a temporary directory for building the package. Behind the scenes, this helper function creates a temporary directory, installs packages using pip into that directory, copies your source code files to the temporary directory, creates the zip file from everything in that directory, and cleans up the temporary directory. Before we move to the next step, let's understand the code that runs inside of Lambda. This diagram shows the complete flow. Feel free to explore the source code if you want more details. Here's what happens step by step. First event is received. When a PDF file is uploaded to the S3 input folder, an S3 event triggers the Lambda function. Second, the ADE handler function extracts the file key from the event. Third, the handler checks that is a PDF file. skips the folders and verifies the output doesn't already exist. Fourth, the PDF is downloaded to Lambda's temporary directory. Fifth, the PDF is sent to the ADE API, which parses it and returns markdown text as well as the chunks. Six, the results are uploaded to the S3 output folder in three formats. A markdown file for the parsed content, a JSON file containing chunk information for visual grounding, and individual chunk JSON files for optimized knowledge base indexing. Let me help you understand what each output file contains. For example, if you upload input medical research_paper.pdf Here's what you have in the S3 bucket output folder after ADE process. The markdown is the complete document in readable format containing anchor tags, linking text to chunk IDs. Grounding JSON is a single file containing all chunks with their bounding box coordinates, as well as other metadata like chunk type and page numbers. Individual chunk JSON is one file per chunk, optimized for vector database ingestion. Each file is self-contained with text, location, and source metadata. Similar to the previous lab, since we're implementing a RAG pipeline, we'll focus on using only output medical chunks. For Bedrock Knowledge Base indexing and for generating the annotated images. Also, we'll generate the embeddings from each individual chunk. The other folders can be used for different experiments and downstream use cases. Step four, creating the IAM Role. What is an IAM role? Lambda function run in isolated containers that have no inherent permissions by default. They can't access any AWS resources. An IAM role grants the function permission to access specific AWS services using temporary credentials. For creating the role, we'll use the create_or_update_lambda_role helper function to create a role with the permissions our Lambda function needs. The roles include these permissions. For S3 permission, s3:GetObject, s3:PutObject, and s3:HeadObject, which help read PDFs from the input folder, write markdown files to the output folder, and check if output folder already exists, respectively. For logs, creates a CloudWatch log group for this function, create log stream, creates a log stream for each execution, and logs:PutLogEvents help write log entries for debugging. Step five, deploying the Lambda function. Now we have both pieces we need, the deployment package, which is our code, and the IAM role, our permissions. Let's deploy the Lambda function using the deploy_lambda_function helper, which expects the function name, the location of the zip file, and the IAM role for Lambda. The deployment includes additional important run configuration options. The environment variables are configuration values our code can access at runtime. Timeout, maximum execution time, set to 900 seconds or 15 minutes for processing larger PDFs. Memory size, amount of RAM allocated. We'll set it to 1,024 megabytes. Step six, setting up the S3 trigger. Our Lambda function is deployed, but it won't automatically run yet. We need to tell S3 to trigger the Lambda function whenever new files are uploaded. S3 can send events to Lambda when objects are created, modified, or deleted. We'll configure it to invoke our function specifically when files are uploaded to the input folder. The setup_s3_trigger helper handles this configuration. Step seven, uploading documents for processing. The infrastructure is now ready. Now let's upload our medical PDF documents and watch the pipeline in action. This diagram shows how PDF files flow from your local medical folder to the S3 input bucket. The Lambda function, which is triggered automatically, processes each PDF using ADE and produces three types of output files. We'll use the upload_folder_to_s3 helper to upload our local documents. While the Lambda function processes our documents, we can monitor its progress in real time. This helper function watches CloudWatch logs to display the processing status. Lambda automatically writes logs, we just need to read them. To stop monitoring, press the escape button followed by double clicking i. You have the option to show all output files, but for the video, we'll just press no. Step eight, connecting to the Bedrock Knowledge Base. Our documents are now parsed and stored in S3. The next step is to make them searchable by ingesting them into the Bedrock Knowledge Base, a vector database that enables semantic search. First, let's verify that our knowledge base is available and properly configured. We'll use the Bedrock agent client to list all knowledge bases and their data sources. Note that the knowledge base was pre-configured in the AWS console to point to our S3. output medical chunks folder as the data source. Use Amazon Titan for creating vector embeddings and storing vectors in OpenSearch Serverless for fast similarity search. This information is not printed here, but I wanted to let you know about the configurations we use in this lab. Step nine, ingesting documents into the Knowledge Base. Now let's sync the parsed documents from S3 into the Knowledge Base. This process is called ingestion. So what happens during ingestion? First, the knowledge base reads all new or modified JSON files from the S3 output medical_chunks folder. Second, it creates vector embeddings for each chunk. Third, these vectors are stored in the database for fast similarity search. Once the ingestion is complete, we can query the Knowledge Base with natural language questions, and it will find the most relevant document section. The start_ingestion_job API kicks off asynchronous processing. It immediately returns the job ID and the actual work happening in the background. Step 10: Creating the search tool with Visual Grounding. With our documents indexed in the knowledge base, we can now create a search tool for our agent. But we're going to add something special, LandingAI's Visual Grounding. The next cell shows the code for our search tool. Notice that it's decorated with @strands.tool, which makes it callable by our agent. The logic in this code is illustrated in this visual diagram. Let me walk you through the logic. The tool follows this pattern. When a user submits a query, like what helps with cold symptoms, First, Bedrock retrieves by querying the knowledge base using hybrid search, combining keyword matching and semantic similarity. Second, for each result, it will check if this is a chunk JSON file from medical_chunks folder. Third, it will parse chunk JSON to get the metadata, like chunk_id, chunk_type, page, and bounding boxes. Then, it will dynamically generate a cropped chunked image. and upload the cropped image to S3 and return a presigned URL. Last but not least, the agent will format the response, give you the source, the chunk ID, page, chunk type, crop chunked image URL, and the content. Before creating the agent, let's verify that our search tool works correctly. This simple test queries the knowledge base and shows the first result. Let's search common cold symptoms. You can see that the knowledge base search is working correctly. We can now print these test results. And when you click on the presigned URL, you can see the chunk image that was dynamically created for traceability and auditability. This could be incorporated into your RPA system or any downstream application for heavily regulated organizations and high-risk operations. Step 11, creating memory for the agent. Our search tool is working. Now, let's create memory for our agent so it can remember conversations and learn user preferences over time. AWS Bedrock Agent Core provides three memory strategies, each serving a different purpose. Summary, summarizes past sessions. User Preference learns user preferences over time, and Semantic extracts and stores facts. We'll configure all three strategies to give our agent comprehensive memory capabilities. We'll first check whether or not there are memories created already for the agent. Otherwise, we'll create a new memory. After creating the memory, we need to set up a session manager that organizes information for each conversation. Each conversation needs two identifiers. Actor ID, who is using the agent, which enables personalization across sessions. The session ID, a unique identifier for this specific conversation. Step 12, creating the Strands agent. We now have all the components ready, a search tool with visual grounding and memory for maintaining context across conversations. Let's bring everything together into a Strands agent. The agent is configured with the model, Claude via Bedrock as the underlying LLM. the system_prompt, which are instructions that define the agent's personality and behavior. the session_manager, memory for remembering preferences, historical summaries, and facts. Tools, the search knowledge base function we created earlier. Notice how the system prompt explicitly instructs the agent to include visual grounding information, like page numbers, location coordinates, and annotated images in its responses. Step 13, the interactive chat. Your medical document agent is now ready. Let's start an interactive chat session. Let's start with how effective is Vitamin C for treating colds? So, here it shows that it used the tool search_knowledge_base and it returned a list of symptoms and the source of information where we can find the annotated image. I will now tell it that I prefer short answers then exit. Let's run it again and ask the same question. You can see that it returned more concise answers based on my previous preference. To exit, type exit, quit, or bye. Congratulations on completing this lab. You've built a complete, production-ready document intelligence pipeline that includes these key components. Automated Document Processing. Lambda functions that automatically parse PDFs when uploaded to S3. Semantic Search. Bedrock Knowledge Base for intelligent document retrieval, Visual Grounding, traceable answers with exact page locations and highlighted images, Conversational Memory, an agent that remembers preferences and conversation history, and Individual Chunk Storage. Optimized chunk files for better Knowledge Base indexing. You can now extend this pipeline to handle other document types like Excel, PowerPoint, and even images. Add more tools to your agent, or integrate with other AWS services as your needs grow. Thank you all for your time, curiosity, and engagement. We're so excited to see what you will create next. We'll see you next time.

Document AI: From OCR to Agentic Doc Extraction

Intermediate

Topics

Document Processing

Collaborator

LandingAI

Document AI: From OCR to Agentic Doc Extraction

Introduction
Video
・
3 mins

Document Processing Basics
Video
・
8 mins

Lab 1: Document Processing with OCR
Video with Code Example
・
13 mins

Four Decades of OCR Evolution
Video
・
8 mins

Lab 2: Document Processing with PaddleOCR
Video with Code Example
・
18 mins

Layout Detection and Reading Order
Video
・
12 mins

Lab 3: Building Agentic Document Understanding
Video with Code Example
・
14 mins

A Single API for Agentic Document Understanding
Video
・
6 mins

Lab 4: Document Understanding with Agentic Document Extraction
Video with Code Example
・
15 mins

Lab 4: Document Understanding with Agentic Document Extraction II
Video with Code Example
・
12 mins

Agentic Document Extraction for RAG
Video
・
14 mins

Lab 5: Agentic Document Extraction for RAG
Video with Code Example
・
9 mins

Building RAG Pipelines with Agentic Document Extraction on AWS
Video
・
14 mins

Lab 6: Building a Research Paper Chatbot with Strands Agents
Video
・
16 mins

Conclusion
Video
・
1 min

Quiz

Graded・Quiz

・

10 mins

Links & Resources
Reading
・
10 mins