Short CourseIntermediate

Semantic Caching for AI Agents

Instructors: Tyler Hutcherson, Iliya Zhechev

Redis logo
  • Intermediate
  • 7 Video Lessons
  • 4 Code Examples
  • Instructors: Tyler Hutcherson, Iliya Zhechev

What you'll learn

  • Understand how semantic caching reduces inference cost and latency by reusing model responses based on meaning instead of exact text.

  • Evaluate cache performance with metrics like hit rate, precision, and latency, and enhance it using techniques such as cross-encoders and LLM validation.

  • Build a faster AI agent by integrating semantic caching, minimize redundant calls, and deliver faster responses as the cache warms up.

About this course

Join our new short course, Semantic Caching for AI Agents! Learn from Tyler Hutcherson, Applied AI Engineering Lead, and Iliya Zhechev, Senior Research Engineer at Redis.

In this course, you’ll build a semantic cache that makes your AI agents faster and more cost-effective by recognizing when different questions mean the same thing. For example, when someone asks “How do I get a refund?” and another asks “I want my money back,” your cache will reuse the answer instead of making another API call, reducing the need for redundant model calls.

In detail, you’ll learn to:

  • Build your first semantic cache from scratch – Build a working cache to see how each component works, then implement it using Redis’ open source tools.
  • Measure cache effectiveness with key metrics – Track cache hit rate, precision, recall, and latency to understand your cache’s real impact.
  • Enhance cache accuracy with advanced techniques – Use threshold tuning, cross-encoders, LLM validation, and fuzzy matching to make your cache more effective.
  • Build a fast AI agent with semantic caching – Integrate semantic caching into an AI agent that reuses results, skips redundant work, and gets faster over time.

Start building AI agents that respond faster and cost less to run.

Who should join?

Developers and ML engineers familiar with Python, embeddings, and basic LLM applications who want to optimize their AI systems’ latency and cost. Experience with basic caching concepts is helpful but not required.

Course Outline

7 Lessons・4 Code Examples
  • Introduction

    Video3 mins
  • Overview of Semantic Caching

    Video9 mins
  • Build Your First Semantic Cache

    Video with Code Example10 mins
  • Measuring Cache Effectiveness

    Video with Code Example13 mins
  • Enhancing Cache Effectiveness

    Video with Code Example12 mins
  • Fast AI Agent with Semantic Cache

    Video with Code Example16 mins
  • Conclusion

    Video1 min
  • Quiz

    Graded・Quiz

    9 mins

Instructors

Tyler Hutcherson

Tyler Hutcherson

Applied AI Engineering, Manager at Redis

Iliya Zhechev

Iliya Zhechev

Senior Research Engineer at Redis

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!