Parameters
See plugin common configurations for configuration options available to all plugins.
In API7 Enterprise (from 3.8.17) and in APISIX (from 3.16.0), you should configure one of the following parameter sets, but not both:
rules- Any combination of
limitandtime_windowand/orinstances
limit
integer | string
vaild vaule:
greater than 0
The maximum number of tokens allowed to consume within a given time interval.
In API7 Enterprise (from 3.8.17) and in APISIX (from 3.16.0), this parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$). In earlier APISIX versions, only the integer type is supported.time_window
integer | string
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limitin seconds.In API7 Enterprise (from 3.8.17) and in APISIX (from 3.16.0), this parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$). In earlier APISIX versions, only the integer type is supported.show_limit_quota_header
boolean
default:
trueIf true, includes the rate limiting response headers. Specifically, when using
limit/time_windoworinstances, the headers include the instance name as a suffix:X-AI-RateLimit-Limit-{name}shows the total quota.X-AI-RateLimit-Remaining-{name}shows the remaining quota.X-AI-RateLimit-Reset-{name}shows the number of seconds until the counter resets.
Whenrulesis set, the headers use a prefix instead. Seerules.header_prefixfor details.limit_strategy
string
default:
total_tokensvaild vaule:
total_tokens,prompt_tokens,completion_tokens, orexpressionType of token to apply rate limiting.
total_tokens,prompt_tokens, andcompletion_tokensvalues are returned in each model response, wheretotal_tokensis the sum ofprompt_tokensandcompletion_tokens.When set to
expression, rate limiting cost is calculated using a custom Lua arithmetic expression defined incost_expr. Available in API7 Enterprise from version 3.9.8. Not available in APISIX yet.cost_expr
string
vaild vaule:
any non-empty string (must be a valid Lua arithmetic expression)
Lua arithmetic expression for dynamic token cost calculation. Variables are injected from the LLM provider's raw usage response fields (e.g.,
input_tokens,output_tokens,cache_creation_input_tokens). Missing variables default to0. Only math functions (abs,ceil,floor,max,min) and arithmetic operators are allowed. Expression syntax is validated at configuration time. Required whenlimit_strategyisexpression, and must not be set otherwise.Example:
input_tokens + cache_creation_input_tokenscomputes cost from Anthropic Claude's cache-aware token usage.Available in API7 Enterprise from version 3.9.8. Not available in APISIX yet.
instances
array[object]
LLM instance rate limiting configurations.
name
string
required
Name of the LLM service instance.
limit
integer | string
required
vaild vaule:
greater than 0
The maximum number of tokens allowed to consume within a given time interval.
In API7 Enterprise (from 3.8.17) and in APISIX (from 3.16.0), this parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$). In earlier APISIX versions, only the integer type is supported.time_window
integer | string
required
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limitin seconds.In API7 Enterprise (from 3.8.17) and in APISIX (from 3.16.0), this parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$). In earlier APISIX versions, only the integer type is supported.
rejected_code
integer
default:
503vaild vaule:
between 200 and 599 inclusive
The HTTP status code returned when a request exceeding the quota is rejected.
rejected_msg
string
vaild vaule:
any non-empty string
The response body returned when a request exceeding the quota is rejected.
policy
string
required
default:
localvaild vaule:
local,redis,redis-cluster, orredis-sentinelThe policy for rate limiting counter. Available in API7 Enterprise from version 3.8.19 and in APISIX from 3.16.0. When upgrading from an earlier version, you must explicitly set this to
localto preserve the previous behavior.Set to
localto store the counter in memory locally.Set to
redisto store the counter on a Redis instance.Set to
redis-clusterto store the counter in a Redis cluster.Set to
redis-sentinelto store the counter on the Redis primary node managed by Redis Sentinel, which ensures high availability by automatically promoting a replica to primary in case of failure. Redis Sentinel provides high availability for Redis when not using Redis Cluster.redis_host
string
The address of the Redis node. Required when
policyisredis.redis_port
integer
default:
6379vaild vaule:
greater than or equal to 1
The port of the Redis node when
policyisredis.redis_username
string
The username for Redis if Redis ACL is used. If you use the legacy authentication method
requirepass, configure only theredis_password. Used whenpolicyisredis.redis_password
string
The password of the Redis node when
policyisredis, orredis-cluster.redis_database
integer
default:
0vaild vaule:
greater than or equal to 0
The database number in Redis when
policyisredisorredis-sentinel.redis_ssl
boolean
default:
falseIf true, use SSL to connect to Redis when
policyisredis.redis_ssl_verify
boolean
default:
falseIf true, verify the server SSL certificate when
policyisredis.redis_timeout
integer
default:
1000vaild vaule:
greater than or equal to 1
The Redis timeout value in milliseconds when
policyisredisorredis-cluster.redis_cluster_nodes
array[string]
The list of Redis cluster nodes with at least two addresses. Required when
policyisredis-cluster.redis_cluster_name
string
The name of the Redis cluster. Required when
policyisredis-cluster.redis_cluster_ssl
boolean
default:
falseIf true, use SSL to connect to Redis cluster when
policyisredis-cluster.redis_cluster_ssl_verify
boolean
default:
falseIf true, verify the server SSL certificate when
policyisredis-cluster.redis_sentinels
array[object]
An array of Redis Sentinel nodes (host and port). Required when
policyisredis-sentinel.redis_master_name
string
The name of the Redis master group that Sentinels are monitoring. Required when
policyisredis-sentinel.redis_role
string
default:
mastervaild vaule:
masterorslaveThe Redis node role to connect to. Configurable when
policyisredis-sentinel. Set tomasterto connect to the current Redis master, and set toslaveto connect to a Redis replica.redis_connect_timeout
integer
default:
1000vaild vaule:
greater than or equal to 1
Timeout in milliseconds for establishing a connection to a Redis node. Configurable when
policyisredis-sentinel.redis_read_timeout
integer
default:
1000vaild vaule:
greater than or equal to 1
Timeout in milliseconds for reading data from a Redis node. Configurable when
policyisredis-sentinel.redis_keepalive_timeout
integer
default:
60000vaild vaule:
greater than or equal to 1
Time in milliseconds that an idle Redis connection is kept alive in the connection pool before being closed. Configurable when
policyisredis-sentinel.sentinel_username
string
Username used to authenticate with the Redis Sentinel instance. Configurable when
policyisredis-sentinel.sentinel_password
string
Password used to authenticate with the Redis Sentinel instance. Configurable when
policyisredis-sentinel.allow_degradation
boolean
default:
falseIf true, allow the gateway to continue handling requests without the plugin when the plugin or its dependencies become unavailable.
Available in API7 Enterprise from version 3.8.19 and in APISIX from 3.16.0.
rules
array[object]
An array of rate-limiting rules that are applied sequentially.
Available in API7 Enterprise from 3.8.17 and in APISIX from 3.16.0.
count
integer | string
required
vaild vaule:
greater than 0
The maximum number of tokens allowed to consume within a given time interval.
This parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$).time_window
integer | string
required
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
countin seconds.This parameter also supports the string data type and allows the use of built-in variables prefixed with a dollar sign (
$).key
string
required
The key to count requests by. If the configured key does not exist, the rule will not be executed.
The
keyis interpreted as a variable. The variable does not need to be prefixed by a dollar sign ($). See built-in variables for available variables.header_prefix
string
Prefix for all rate limiting response headers. Available in API7 Enterprise from version 3.8.19. Not yet available in APISIX.
When configured, the prefix is inserted after
X-AI-in the header name. For example, withheader_prefixset totest, the headers becomeX-AI-Test-RateLimit-Limit,X-AI-Test-RateLimit-Remaining, andX-AI-Test-RateLimit-Reset.When not configured, the index of the rule in the rules array is used as the prefix. For example, headers for the first rule will be
X-AI-1-RateLimit-Limit,X-AI-1-RateLimit-Remaining, andX-AI-1-RateLimit-Reset.