Welcome to Semantic Caching for AI Agents built in partnership with Redis, taught by Tyler Hutcherson, who is applied AI engineering lead and Iliya Zhechev, who is senior research engineer at Redis. In this course, you'll learn how to make your AI agents faster and more cost-effective by adding a semantic cache. In many projects, inference costs and latency affect your ability to scale an application. Traditional input-output caches that work when the input text is exactly the same can help in some cases, but if someone asks, how can I get a refund? and another person asks, I want my money back. a normal exact match cache sees them as completely different. Semantic caching on the other hand looks at meaning. It uses embeddings to measure how similar two questions are in this meaning space. And if they're semantically similar, then we can reuse the model's response to an old answer instead of calling the model again. Thanks, Andrew. You'll first build a cache from scratch to learn under the hood of semantic caching piece by piece. You will create embeddings, compare distances, and set a threshold to decide when two queries are semantically similar enough. Then we'll move it into implementing our semantic cache using Redis as open source SDK. This will make your cache closer to a production deployment as your cache will now have features like time to live to keep your cache fresh and small, and even separate caches for different users or teams or tenants. We'll also use our open weight embedding model, fine-tuned for cache accuracy. Once you have a working cache, we'll measure how well it performs. We'll look at hit rate, precision, and recall. These metrics will show how often your cache helps and how often it is correct. You'll visualize them in a confusion matrix and you'll see how changing the similarity threshold will shift the balance between precision and recall. We'll also look at latency and we'll see how a few hits quickly add up to a big time saving. After measuring the effectiveness of your cache, you'll learn four methods for enhancing it. You'll optimize the threshold and we'll use a cross-encoder for better re-ranking. And even a small LLM check that can confirm whether two questions mean the same thing. We also add fuzzy matching to handle simple typos that frequently happen when users ask questions. Finally, we'll connect it all inside an AI agent. The agent breaks a big question into smaller parts and checks the cache for each one and only calls the LLM when it needs to. This means every new user or slightly different phrasing benefits from what the system already knows. Over time, as the cache warms up, model calls drop and responses feel just as good but arrive much faster. Many people have worked to create this course. I'd like to thank from Redis, the applied AI, AI research, product, and education teams. From DeepLearning.AI, Esmaeil Gargari also contributed to this course. The first lesson will be an overview of semantic caching. You will also learn about a real use case where Walmart published techniques to improve their production caching system. So, let's go into the next video and get started.

Semantic Caching for AI Agents

Introduction
Video
・
3 mins

Overview of Semantic Caching
Video
・
9 mins

Build Your First Semantic Cache
Video with Code Example
・
10 mins

Measuring Cache Effectiveness
Video with Code Example
・
13 mins

Enhancing Cache Effectiveness
Video with Code Example
・
12 mins

Fast AI Agent with Semantic Cache
Video with Code Example
・
16 mins

Conclusion
Video
・
1 min

Quiz

Graded・Quiz

・

9 mins

OPTIONAL: Project
Code Example
・
10 mins