Skip to main content

Rate Limits

Rate limits protect upstream providers and keep one caller, model, team, or member from consuming shared gateway capacity. They are useful when applications share the same model alias, upstream credential, or provider quota.

In this guide, you will add a caller-specific request limit, send traffic through the gateway, and confirm that AISIX rejects requests after the quota is exceeded. You can also use similar limits on a model alias. In AISIX Cloud, you can configure wider team and member policies.

Prerequisites

Before starting, prepare the following:

  • A self-hosted AISIX gateway with the admin and proxy listeners available.
  • The admin key from the gateway config.yaml.
  • A working model alias that can serve proxy requests. If you have not created one yet, configure Provider Keys and Models first.

Choose Where to Apply the Limit

AISIX can apply rate limits in three places. Choose the smallest scope that matches the quota you want to protect:

  • A caller API key, when one application or tenant should have its own quota.
  • A model, when several caller API keys share the same expensive or fragile model alias.
  • A shared rate limit policy, when AISIX Cloud needs a wider quota bucket, such as a team, member, or one bucket per team member.

Before AISIX sends a request to the upstream provider, it checks every matching limit that has been configured. If any one limit has no remaining capacity, AISIX rejects the request with 429.

The example below protects one application, so it applies the limit to that application's caller API key.

Configure a Caller Limit

The most direct way to rate limit an application is to attach a limit to the caller API key. In this example, the key can send one request per minute to the configured model alias.

Choose a plaintext caller API key, set the model alias, and hash the key before creating the admin resource:

export AISIX_ADMIN_KEY="admin-local-only-change-me"
export AISIX_API_KEY="sk-rate-limit-caller"
export AISIX_MODEL="gpt-4o-prod"

AISIX_API_KEY_HASH=$(printf '%s' "${AISIX_API_KEY}" | shasum -a 256 | awk '{print $1}')

Create a caller API key resource with a one-request-per-minute limit. The request goes to the caller API key admin route, and the rate_limit field in the request body defines the limit. The rpm field means requests per minute:

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"key_hash": "'"${AISIX_API_KEY_HASH}"'",
"allowed_models": ["'"${AISIX_MODEL}"'"],
"rate_limit": {
"rpm": 1
}
}'

You should see a response similar to the following:

{
"id": "4ae2b1b8-5e2c-4f44-8d8a-2f6a6f5ef7f8",
"value": {
"key_hash": "4b4f91305bd7f14a04ef6c850b3f4d0a8ce9ac67bc63f8b342ccdfd0d2f5b8f8",
"allowed_models": [
"gpt-4o-prod"
],
"rate_limit": {
"rpm": 1
}
},
"revision": 1
}

Verify Rate Limiting

Send three requests with the rate-limited caller API key:

for i in 1 2 3; do
printf "request %s: " "${i}"
curl -sS -o /dev/null -w "%{http_code}\n" -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${AISIX_MODEL}"'",
"messages": [
{"role": "user", "content": "Hello from AISIX."}
]
}'
done

The first request should reach the upstream model. The following requests should exceed the one-request-per-minute limit:

request 1: 200
request 2: 429
request 3: 429

When AISIX rejects the request, the response uses the proxy error format and includes Retry-After when the limiter can calculate a retry window:

HTTP/1.1 429 Too Many Requests
retry-after: 60

{
"error": {
"message": "request limit exceeded (requests)",
"type": "rate_limit_exceeded"
}
}

Other Limit Types

The example above limits request count. AISIX can also limit token usage and in-flight requests:

  • Request limits cap how many requests can be sent in a time window, such as requests per minute.
  • Token limits cap prompt and completion tokens in a time window, such as tokens per minute. AISIX records token usage after the upstream response returns provider-reported usage, so a large response can consume remaining token capacity and cause a later request to be rejected.
  • Concurrency limits cap how many requests can be in progress at the same time.

In inline rate_limit objects, request-count fields use rps, rpm, rph, or rpd; token fields use tpm or tpd; concurrency uses concurrency.

Inline rate-limit counters are local to each gateway process. In multi-instance self-hosted deployments, account for the number of instances when you set caps, or route the same tenant, caller API key, or model alias to a consistent gateway group when quotas must be tighter.

You can attach the same rate_limit object to a model when the limit should be shared by every caller of that model alias. Add it when you create or update the model through /admin/v1/models. The following example allows up to 300 requests per minute and 20 concurrent requests:

{
"display_name": "gpt-4o-prod",
"provider": "openai",
"model_name": "gpt-4o",
"provider_key_id": "YOUR_PROVIDER_KEY_ID",
"rate_limit": {
"rpm": 300,
"concurrency": 20
}
}

For team or member limits, use a shared policy.

info

Shared policies are configured in AISIX Cloud. The Admin API does not support creating shared policies.

The following example shows a shared policy that limits one team bucket to 1,000,000 tokens per minute:

{
"name": "team-acme-tpm",
"scope": "team",
"scope_ref": "team-uuid-acme",
"window": "minute",
"max_tokens": 1000000
}

For shared policies, token limits use a minute window. Request-count limits can use second, minute, or hour windows.

Next Steps

You have now configured a caller-specific rate limit and seen how AISIX rejects traffic after the quota is exceeded. Next, continue with Budgets to understand Cloud-managed spend controls.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation