Response Caching

Response caching lets AISIX reuse an earlier non-streaming chat-completions response when a later request has the same cache key. It is gateway-side response caching, not provider prompt caching and not the managed configuration snapshot cache.

In this guide, you will create a cache policy, verify cache miss and hit behavior with the x-aisix-cache response header, and review how cache scope and backend settings work.

Prerequisites

Before starting, prepare the following:

A self-hosted AISIX gateway with the admin and proxy listeners available.
The admin key from the gateway config.yaml.
A working model alias and caller API key that can send non-streaming chat-completions requests.
jq to print the cache policy create response and capture the returned ID.

How Response Caching Works

Response caching uses two layers:

Layer	Configured By	What It Controls
Cache backend availability	Startup configuration	Which storage backends the gateway process can use.
Cache policy	Dynamic cache policy resource	Which non-streaming chat-completions requests may use caching, and which available backend they use.

Both layers must line up before a response can be cached. Startup configuration makes a backend available. The matching cache policy selects the backend for that request.

AISIX caches exact non-streaming chat-completions responses. Streaming responses and other proxy API families do not use this response-cache path.

Configure Cache Backends

AISIX always builds the in-process memory cache. Add Redis startup configuration when one or more cache policies should use shared cache entries across gateway instances.

Configure Redis through startup configuration:

config.yaml
cache:
  redis:
    mode: single
    url: redis://127.0.0.1:6379/

See Configuration Files for how AISIX loads config.yaml and applies environment-variable overrides.

The cache.backend startup field is a legacy compatibility setting. It no longer selects one global cache for all requests. If you set cache.backend: redis, AISIX still requires a cache.redis block and fails startup when Redis is not configured.

Configure a Cache Policy

Set the values used by the example requests:

export AISIX_ADMIN_KEY="admin-local-only-change-me"
export AISIX_API_KEY="sk-demo-caller"
export AISIX_MODEL="gpt-4o-mini"
export CACHE_PROMPT="cache-check-$(date +%s)"

Create a cache policy for the example model alias and save the response:

CACHE_POLICY_RESPONSE=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/cache_policies" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "default-chat-cache",
    "enabled": true,
    "backend": "memory",
    "applies_to": "model:'"${AISIX_MODEL}"'",
    "ttl_seconds": 3600
  }')

Print the response and copy the returned ID:

printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq .
CACHE_POLICY_ID=$(printf '%s\n' "${CACHE_POLICY_RESPONSE}" | jq -r '.id // empty')

You should see a response similar to the following:

{
  "id": "8d86d1cf-4a46-4a71-b4ef-c01e51f77f21",
  "value": {
    "name": "default-chat-cache",
    "enabled": true,
    "backend": "memory",
    "ttl_seconds": 3600,
    "applies_to": "model:gpt-4o-mini"
  },
  "revision": 1
}

The returned ID is used later to delete the policy.

Verify Cache Behavior

After the cache policy is configured, send repeated requests to compare cache miss and cache hit behavior.

First, send a request with the cache prompt:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${AISIX_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${AISIX_MODEL}"'",
    "messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
  }'

The first matching request should include this response header:

x-aisix-cache: miss

The miss value means AISIX called the upstream provider and wrote the response into the cache.

Repeat the request with the same body and model alias:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${AISIX_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${AISIX_MODEL}"'",
    "messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"'"}]
  }'

The repeated request should include this response header:

x-aisix-cache: hit

The hit value means AISIX served the cached copy without calling the upstream provider.

Change the prompt to confirm that the cache key is tied to the request body:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${AISIX_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${AISIX_MODEL}"'",
    "messages": [{"role": "user", "content": "'"${CACHE_PROMPT}"' different"}]
  }'

The changed request should return a cache miss.

Delete the cache policy when you finish the test:

curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/cache_policies/${CACHE_POLICY_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"

Deleting the policy disables caching for that scope. In-memory cache entries are dropped when the gateway restarts.

Scope Cache Policies

The applies_to field controls which requests match a cache policy:

Value	Scope
`all`	Every eligible non-streaming chat-completions request.
`model:<alias>`	Requests that use the caller-visible model alias.
`api_key:<id>`	Requests authenticated with the caller API key resource ID.

For multi-target models, the cache key uses the alias the caller requested, not the target model that served the miss.

Start with a narrow policy, such as a model-scoped or caller-key-scoped policy. Use a global policy only when every eligible chat-completions request in the environment should participate in response caching.

Avoid unsupported matcher prefixes. The gateway treats unknown forms as global, so a typo can make a policy broader than intended.

Select a Cache Backend

The backend field on each cache policy selects where matching responses are stored:

Backend	Behavior
`memory`	Uses the in-process cache on the gateway instance that handled the miss.
`redis`	Uses the shared Redis cache when `cache.redis` is configured at startup.

Use memory for single-instance deployments or for policies where node-local cache entries are acceptable. Use Redis when several gateway instances should share cached responses for the same policy.

If a matching policy selects Redis but the gateway process did not start with cache.redis, AISIX disables caching for that policy's matching requests. It does not silently fall back to memory.

The cache.redis.mode field supports the same Redis connection modes as the rate-limit backend:

config.yaml
cache:
  redis:
    mode: cluster
    nodes:
      - redis://10.0.0.1:6379/
      - redis://10.0.0.2:6379/

Use single for one Redis endpoint, cluster for Redis Cluster seed nodes, or sentinel with sentinels and master_name for a Sentinel-managed master. For full mode examples, see Redis Connection Modes.

Review Cache Behavior

If cache headers do not appear, check that all three caching requirements are true: the selected backend is available, an enabled cache policy matches the request, and the request is a non-streaming chat-completions request.

If a policy applies more broadly than expected, check its scope. Unknown matcher prefixes fall back to global scope, so use only the supported matcher values listed above.

Next Steps

You have now configured a response cache policy and verified miss and hit behavior. Next, continue with Guardrails to add request and response checks before and after provider calls.

Prerequisites​

How Response Caching Works​

Configure Cache Backends​

Configure a Cache Policy​

Verify Cache Behavior​

Scope Cache Policies​

Select a Cache Backend​

Review Cache Behavior​

Next Steps​