Response Caching
Response caching lets AISIX reuse an earlier non-streaming chat-completions response when a later request has the same cache key. It is gateway-side response caching, not provider prompt caching and not the managed configuration snapshot cache.
In this guide, you will create a cache policy, verify cache miss and hit behavior with the x-aisix-cache response header, and review how cache scope and backend settings work.
Prerequisites
Before starting, prepare the following:
- A self-hosted AISIX gateway with the admin and proxy listeners available.
- The admin key from the gateway
config.yaml. - A working model alias and caller API key that can send non-streaming chat-completions requests.
jqto print the cache policy create response and capture the returned ID.
How Response Caching Works
Response caching uses two layers:
| Layer | Configured By | What It Controls |
|---|---|---|
| Cache backend availability | Startup configuration | Which storage backends the gateway process can use. |
| Cache policy | Dynamic cache policy resource | Which non-streaming chat-completions requests may use caching, and which available backend they use. |
Both layers must line up before a response can be cached. Startup configuration makes a backend available. The matching cache policy selects the backend for that request.
AISIX caches exact non-streaming chat-completions responses. Streaming responses and other proxy API families do not use this response-cache path.
Configure Cache Backends
AISIX always builds the in-process memory cache. Add Redis startup configuration when one or more cache policies should use shared cache entries across gateway instances.
Configure Redis through startup configuration:
cache:
redis:
mode: single
url: redis://127.0.0.1:6379/
See Configuration Files for how AISIX loads config.yaml and applies environment-variable overrides.
The cache.backend startup field is a legacy compatibility setting. It no longer selects one global cache for all requests. If you set cache.backend: redis, AISIX still requires a cache.redis block and fails startup when Redis is not configured.
Configure a Cache Policy
Set the values used by the example requests:
export AISIX_ADMIN_KEY="admin-local-only-change-me"
export AISIX_API_KEY="sk-demo-caller"
export AISIX_MODEL="gpt-4o-mini"
export CACHE_PROMPT="cache-check-$(date +%s)"
Create a cache policy for the example model alias and save the response:
CACHE_POLICY_RESPONSE=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/cache_policies" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "default-chat-cache",
"enabled": true,
"backend": "memory",
"applies_to": "model:'"${AISIX_MODEL}"'",
"ttl_seconds": 3600
}')
Print the response and copy the returned ID:
printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq .
CACHE_POLICY_ID=$(printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq -r '.id // empty')
You should see a response similar to the following:
{
"id": "8d86d1cf-4a46-4a71-b4ef-c01e51f77f21",
"value": {
"name": "default-chat-cache",
"enabled": true,
"backend": "memory",
"ttl_seconds": 3600,
"applies_to": "model:gpt-4o-mini"
},
"revision": 1
}
The returned ID is used later to delete the policy.
Verify Cache Behavior
After the cache policy is configured, send repeated requests to compare cache miss and cache hit behavior.
First, send a request with the cache prompt:
curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
}'
The first matching request should include this response header:
x-aisix-cache: miss
The miss value means AISIX called the upstream provider and wrote the response into the cache.
Repeat the request with the same body and model alias:
curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
}'
The repeated request should include this response header:
x-aisix-cache: hit
The hit value means AISIX served the cached copy without calling the upstream provider.
Change the prompt to confirm that the cache key is tied to the request body:
curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"' different"}]
}'
The changed request should return a cache miss.
Delete the cache policy when you finish the test:
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/cache_policies/${CACHE_POLICY_ID}" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
Deleting the policy disables caching for that scope. In-memory cache entries are dropped when the gateway restarts.
Scope Cache Policies
The applies_to field controls which requests match a cache policy:
| Value | Scope |
|---|---|
all | Every eligible non-streaming chat-completions request. |
model:<alias> | Requests that use the caller-visible model alias. |
api_key:<id> | Requests authenticated with the caller API key resource ID. |
For multi-target models, the cache key uses the alias the caller requested, not the target model that served the miss.
Start with a narrow policy, such as a model-scoped or caller-key-scoped policy. Use a global policy only when every eligible chat-completions request in the environment should participate in response caching.
Avoid unsupported matcher prefixes. The gateway treats unknown forms as global, so a typo can make a policy broader than intended.
Select a Cache Backend
The backend field on each cache policy selects where matching responses are stored:
| Backend | Behavior |
|---|---|
memory | Uses the in-process cache on the gateway instance that handled the miss. |
redis | Uses the shared Redis cache when cache.redis is configured at startup. |
Use memory for single-instance deployments or for policies where node-local cache entries are acceptable. Use Redis when several gateway instances should share cached responses for the same policy.
If a matching policy selects Redis but the gateway process did not start with cache.redis, AISIX disables caching for that policy's matching requests. It does not silently fall back to memory.
The cache.redis.mode field supports the same Redis connection modes as the rate-limit backend:
cache:
redis:
mode: cluster
nodes:
- redis://10.0.0.1:6379/
- redis://10.0.0.2:6379/
Use single for one Redis endpoint, cluster for Redis Cluster seed nodes, or sentinel with sentinels and master_name for a Sentinel-managed master. For full mode examples, see Redis Connection Modes.
Review Cache Behavior
If cache headers do not appear, check that all three caching requirements are true: the selected backend is available, an enabled cache policy matches the request, and the request is a non-streaming chat-completions request.
If a policy applies more broadly than expected, check its scope. Unknown matcher prefixes fall back to global scope, so use only the supported matcher values listed above.
Next Steps
You have now configured a response cache policy and verified miss and hit behavior. Next, continue with Guardrails to add request and response checks before and after provider calls.