Parameters
See plugin common configurations for configuration options available to all plugins.
limit
integer
vaild vaule:
greater than 0
The maximum number of token allowed to consume within a given time interval. At least one of the
limitandinstances.limitshould be configured.time_window
integer
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limitin seconds.bAt least one of thetime_windowandinstances.time_windowshould be configured.show_limit_quota_header
boolean
default:
trueIf true, include
X-AI-RateLimit-Limit-*to show the total quota,X-AI-RateLimit-Remaining-*to show the remaining quota in the response header, andX-AI-RateLimit-Reset-*to show the number of seconds left for the counter to reset, where*is the instance name.limit_strategy
string
default:
total_tokensvaild vaule:
total_tokens,prompt_tokens, orcompletion_tokensType of token to apply rate limiting.
total_tokens,prompt_tokens, andcompletion_tokensvalues are returned in each model response, wheretotal_tokensis the sum ofprompt_tokensandcompletion_tokens.instances
array[object]
LLM instance rate limiting configurations.
name
string
required
Name of the LLM service instance.
limit
integer
required
vaild vaule:
greater than 0
The maximum number of token allowed to consume within a given time interval.
time_window
integer
required
vaild vaule:
greater than 0
The time interval corresponding to the rate limiting
limitin seconds.
rejected_code
integer
default:
503vaild vaule:
between 200 and 599 inclusive
The HTTP status code returned when a request exceeding the quota is rejected.
rejected_msg
string
vaild vaule:
any non-empty string
The response body returned when a request exceeding the quota is rejected.