Parameters
See plugin common configurations for configuration options available to all plugins.
limit
integer
vaild vaule:
greater than 0
The maximum number of token allowed to consume within a given time interval. At least one of the
limit
andinstances.limit
should be configured.time_window
integer
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limit
in seconds.bAt least one of thetime_window
andinstances.time_window
should be configured.show_limit_quota_header
boolean
default:
true
If true, include
X-AI-RateLimit-Limit-*
to show the total quota,X-AI-RateLimit-Remaining-*
to show the remaining quota in the response header, andX-AI-RateLimit-Reset-*
to show the number of seconds left for the counter to reset, where*
is the instance name.limit_strategy
string
default:
total_tokens
vaild vaule:
total_tokens
,prompt_tokens
, orcompletion_tokens
Type of token to apply rate limiting.
total_tokens
,prompt_tokens
, andcompletion_tokens
values are returned in each model response, wheretotal_tokens
is the sum ofprompt_tokens
andcompletion_tokens
.instances
array[object]
LLM instance rate limiting configurations.
name
string
required
Name of the LLM service instance.
limit
integer
required
vaild vaule:
greater than 0
The maximum number of token allowed to consume within a given time interval.
time_window
integer
required
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limit
in seconds.
rejected_code
integer
default:
503
vaild vaule:
between 200 and 599 inclusive
The HTTP status code returned when a request exceeding the quota is rejected.
rejected_msg
string
vaild vaule:
any non-empty string
The response body returned when a request exceeding the quota is rejected.