UB
TTL-based eviction is a solid pattern for that - we hit the same repeated-query problem especially with tool-calling loops where the LLM retries the same MCP call 3-4 times in a row. One thing worth watching: if you are caching results from tools that touch external state (DB writes, API calls with side effects), the cache can mask whether the operation actually executed. We ended up keying our cache on both the query hash and a read-only flag to avoid that. What TTL range are you using? Curious if shorter windows work better for security-sensitive queries where the data staleness risk is higher.