Skip to main content

Response Caching

Response caching lets AISIX reuse an earlier non-streaming chat-completions response when a later request has the same cache key. It is gateway-side response caching, not provider prompt caching and not the managed configuration snapshot cache.

In this guide, you will create a cache policy, verify cache miss and hit behavior with the x-aisix-cache response header, and review how cache scope and backend settings work.

Prerequisites

Before starting, prepare the following:

  • A self-hosted AISIX gateway with the admin and proxy listeners available.
  • The admin key from the gateway config.yaml.
  • A working model alias and caller API key that can send non-streaming chat-completions requests.
  • jq to print the cache policy create response and capture the returned ID.

How Response Caching Works

Response caching uses two layers:

LayerConfigured ByWhat It Controls
Cache backend availabilityStartup configurationWhich storage backends the gateway process can use.
Cache policyDynamic cache policy resourceWhich non-streaming chat-completions requests may use caching, and which available backend they use.

Both layers must line up before a response can be cached. Startup configuration makes a backend available. The matching cache policy selects the backend for that request.

AISIX caches exact non-streaming chat-completions responses. Streaming responses and other proxy API families do not use this response-cache path.

Configure Cache Backends

AISIX always builds the in-process memory cache. Add Redis startup configuration when one or more cache policies should use shared cache entries across gateway instances.

Configure Redis through startup configuration:

config.yaml
cache:
redis:
mode: single
url: redis://127.0.0.1:6379/

See Configuration Files for how AISIX loads config.yaml and applies environment-variable overrides.

The cache.backend startup field is a legacy compatibility setting. It no longer selects one global cache for all requests. If you set cache.backend: redis, AISIX still requires a cache.redis block and fails startup when Redis is not configured.

Configure a Cache Policy

Set the values used by the example requests:

export AISIX_ADMIN_KEY="admin-local-only-change-me"
export AISIX_API_KEY="sk-demo-caller"
export AISIX_MODEL="gpt-4o-mini"
export CACHE_PROMPT="cache-check-$(date +%s)"

Create a cache policy for the example model alias and save the response:

CACHE_POLICY_RESPONSE=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/cache_policies" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "default-chat-cache",
"enabled": true,
"backend": "memory",
"applies_to": "model:'"${AISIX_MODEL}"'",
"ttl_seconds": 3600
}')

Print the response and copy the returned ID:

printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq .
CACHE_POLICY_ID=$(printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq -r '.id // empty')

You should see a response similar to the following:

{
"id": "8d86d1cf-4a46-4a71-b4ef-c01e51f77f21",
"value": {
"name": "default-chat-cache",
"enabled": true,
"backend": "memory",
"ttl_seconds": 3600,
"applies_to": "model:gpt-4o-mini"
},
"revision": 1
}

The returned ID is used later to delete the policy.

Verify Cache Behavior

After the cache policy is configured, send repeated requests to compare cache miss and cache hit behavior.

First, send a request with the cache prompt:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
}'

The first matching request should include this response header:

x-aisix-cache: miss

The miss value means AISIX called the upstream provider and wrote the response into the cache.

Repeat the request with the same body and model alias:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
}'

The repeated request should include this response header:

x-aisix-cache: hit

The hit value means AISIX served the cached copy without calling the upstream provider.

Change the prompt to confirm that the cache key is tied to the request body:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"' different"}]
}'

The changed request should return a cache miss.

Delete the cache policy when you finish the test:

curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/cache_policies/${CACHE_POLICY_ID}" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}"

Deleting the policy disables caching for that scope. In-memory cache entries are dropped when the gateway restarts.

Scope Cache Policies

The applies_to field controls which requests match a cache policy:

ValueScope
allEvery eligible non-streaming chat-completions request.
model:<alias>Requests that use the caller-visible model alias.
api_key:<id>Requests authenticated with the caller API key resource ID.

For multi-target models, the cache key uses the alias the caller requested, not the target model that served the miss.

Start with a narrow policy, such as a model-scoped or caller-key-scoped policy. Use a global policy only when every eligible chat-completions request in the environment should participate in response caching.

Avoid unsupported matcher prefixes. The gateway treats unknown forms as global, so a typo can make a policy broader than intended.

Select a Cache Backend

The backend field on each cache policy selects where matching responses are stored:

BackendBehavior
memoryUses the in-process cache on the gateway instance that handled the miss.
redisUses the shared Redis cache when cache.redis is configured at startup.

Use memory for single-instance deployments or for policies where node-local cache entries are acceptable. Use Redis when several gateway instances should share cached responses for the same policy.

If a matching policy selects Redis but the gateway process did not start with cache.redis, AISIX disables caching for that policy's matching requests. It does not silently fall back to memory.

The cache.redis.mode field supports the same Redis connection modes as the rate-limit backend:

config.yaml
cache:
redis:
mode: cluster
nodes:
- redis://10.0.0.1:6379/
- redis://10.0.0.2:6379/

Use single for one Redis endpoint, cluster for Redis Cluster seed nodes, or sentinel with sentinels and master_name for a Sentinel-managed master. For full mode examples, see Redis Connection Modes.

Review Cache Behavior

If cache headers do not appear, check that all three caching requirements are true: the selected backend is available, an enabled cache policy matches the request, and the request is a non-streaming chat-completions request.

If a policy applies more broadly than expected, check its scope. Unknown matcher prefixes fall back to global scope, so use only the supported matcher values listed above.

Next Steps

You have now configured a response cache policy and verified miss and hit behavior. Next, continue with Guardrails to add request and response checks before and after provider calls.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation