AI Rate Limiting | APISIX & API7 API Gateway Docs

Parameters

See plugin common configurations for configuration options available to all plugins.

limit
integer
vaild vaule:
greater than 0
The maximum number of token allowed to consume within a given time interval. At least one of the limit and instances.limit should be configured.
time_window
integer
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting limit in seconds.bAt least one of the time_window and instances.time_window should be configured.
show_limit_quota_header
boolean
default: true
If true, include X-AI-RateLimit-Limit-* to show the total quota, X-AI-RateLimit-Remaining-* to show the remaining quota in the response header, and X-AI-RateLimit-Reset-* to show the number of seconds left for the counter to reset, where * is the instance name.
limit_strategy
string
default: total_tokens
vaild vaule:
total_tokens, prompt_tokens, or completion_tokens
Type of token to apply rate limiting. total_tokens, prompt_tokens, and completion_tokens values are returned in each model response, where total_tokens is the sum of prompt_tokens and completion_tokens.
instances
array[object]
LLM instance rate limiting configurations.
- name
  string
  required
  Name of the LLM service instance.
- limit
  integer
  required
  vaild vaule:
  greater than 0
  The maximum number of token allowed to consume within a given time interval.
- time_window
  integer
  required
  vaild vaule:
  greater than 0
  The time interval corresponding to the rate limiting limit in seconds.
rejected_code
integer
default: 503
vaild vaule:
between 200 and 599 inclusive
The HTTP status code returned when a request exceeding the quota is rejected.
rejected_msg
string
vaild vaule:
any non-empty string
The response body returned when a request exceeding the quota is rejected.

Parameters​

Parameters