Short CourseIntermediate

Semantic Caching for AI Agents

Instructors: Tyler Hutcherson, Iliya Zhechev

Redis logo
  • Intermediate
  • 7 Video Lessons
  • 5 Code Examples
  • Instructors: Tyler Hutcherson, Iliya Zhechev

What you'll learn

  • Understand how semantic caching reduces inference cost and latency by reusing model responses based on meaning instead of exact text.

  • Evaluate cache performance with metrics like hit rate, precision, and latency, and enhance it using techniques such as cross-encoders and LLM validation.

  • Build a faster AI agent by integrating semantic caching, minimize redundant calls, and deliver faster responses as the cache warms up.

About this course

Join our new short course, Semantic Caching for AI Agents! Learn from Tyler Hutcherson, Applied AI Engineering Lead, and Iliya Zhechev, Senior Research Engineer at Redis.

In this course, you’ll build a semantic cache that makes your AI agents faster and more cost-effective by recognizing when different questions mean the same thing. For example, when someone asks “How do I get a refund?” and another asks “I want my money back,” your cache will reuse the answer instead of making another API call, reducing the need for redundant model calls.

In detail, you’ll learn to:

  • Build your first semantic cache from scratch – Build a working cache to see how each component works, then implement it using Redis’ open source tools.
  • Measure cache effectiveness with key metrics – Track cache hit rate, precision, recall, and latency to understand your cache’s real impact.
  • Enhance cache accuracy with advanced techniques – Use threshold tuning, cross-encoders, LLM validation, and fuzzy matching to make your cache more effective.
  • Build a fast AI agent with semantic caching – Integrate semantic caching into an AI agent that reuses results, skips redundant work, and gets faster over time.

Start building AI agents that respond faster and cost less to run.

Who should join?

Developers and ML engineers familiar with Python, embeddings, and basic LLM applications who want to optimize their AI systems’ latency and cost. Experience with basic caching concepts is helpful but not required.

Course Outline

7 Lessons・5 Code Examples

Instructors

Tyler Hutcherson

Tyler Hutcherson

Applied AI Engineering, Manager at Redis

Iliya Zhechev

Iliya Zhechev

Senior Research Engineer at Redis

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!