Parameters
See plugin common configurations for configuration options available to all plugins.
fallback_strategy
string or array
vaild vaule:
string:
instance_health_and_rate_limiting,http_429, orhttp_5xx
array: Any combination ofrate_limiting,http_429, andhttp_5xxFallback strategy. The option
instance_health_and_rate_limitingis kept for backward compatibility and is functionally the same asrate_limiting.With
rate_limitingorinstance_health_and_rate_limiting, when the current instance's quota is exhausted, the request is forwarded to the next instance regardless of priority. Withhttp_429, if an instance returns status code 429, the request is retried with other instances. Withhttp_5xx, if an instance returns a 5xx status code, the request is retried with other instances. If all instances fail, the plugin returns the last error response code.When not set, the plugin will not forward the request to low priority instances when tokens of the high priority instance are exhausted.
balancer
object
Load balancing configurations.
algorithm
string
default:
roundrobinvaild vaule:
roundrobinorchashLoad balancing algorithm. When set to
roundrobin, weighted round robin algorithm is used. When set tochash, consistent hashing algorithm is used.hash_on
string
vaild vaule:
vars,headers,cookie,consumer, orvars_combinationsUsed when
typeischash. Support hashing on built-in variables, headers, cookie, consumer, or a combination of built-in variables.key
string
Used when
typeischash. Whenhash_onis set toheaderorcookie,keyis required. Whenhash_onis set toconsumer,keyis not required as the consumer name will be used as the key automatically.
instances
array[object]
required
LLM instance configurations.
name
string
required
Name of the LLM service instance.
provider
string
required
vaild vaule:
openai,deepseek,openai-compatible,azure-openai, oraimlapiLLM service provider.
When set to
openai, the plugin will proxy requests tohttps://api.openai.com/chat/completions.When set to
deepseek, the plugin will proxy requests tohttps://api.deepseek.com/chat/completions.When set to
aimlapi, the plugin uses the OpenAI-compatible driver and proxies the request tohttps://api.aimlapi.com/v1/chat/completionsby default. Theaimlapioption is currently available in APISIX and will be supported in API7 Enterprise soon.When set to
openai-compatible, the plugin proxies requests to the custom endpoint configured inoverride.When set to
azure-openai, the plugin also proxies requests to the custom endpoint configured inoverrideand additionally removes themodelparameter from user requests.priority
integer
default:
0Priority of the LLM instance in load balancing.
prioritytakes precedence overweight.weight
string
required
default:
0vaild vaule:
greater than or equal to 0
Weight of the LLM instance in load balancing.
auth
object
required
Authentication configurations.
header
object
Authentication headers. At least one of the
headerandqueryshould be configured. You can configure additional custom headers that will be forwarded to the upstream LLM service.query
object
Authentication query parameters. At least one of the
headerandqueryshould be configured.
options
object
Model configurations.
In addition to
model, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI or DeepSeek, you can configure additional parameters such asmax_tokens,temperature,top_p, andstream. See your LLM provider's API documentation for more available options.model
string
Name of the LLM model, such as
gpt-4orgpt-3.5. See your LLM provider's API documentation for more available models.
override
object
Override setting.
endpoint
string
LLM provider endpoint to replace the default endpoint with. If not configured, the plugin uses the default OpenAI endpoint
https://api.openai.com/v1/chat/completions.
checks
object
Health check configurations.
Note that at the moment, OpenAI and DeepSeek do not provide an official health check endpoint. Other LLM services that you can configure under
openai-compatibleprovider may have available health check endpoints.active
object
required
Active health check configurations.
type
string
default:
httpvaild vaule:
http,https, ortcpType of health check connection.
timeout
number
default:
1Health check timeout in seconds.
concurrency
integer
default:
10Number of upstream nodes to be checked at the same time.
host
string
HTTP host.
port
integer
vaild vaule:
between 1 and 65535 inclusive
HTTP port.
http_path
string
default:
/vaild vaule:
between 1 and 65535 inclusive
Path for HTTP probing requests.
https_verify_certificate
boolean
default:
trueIf true, verify the node's TLS certificate.
healthy
object
Healthy check configurations.
interval
integer
default:
1Time interval of checking healthy nodes, in seconds.
http_statuses
array[integer]
default:
[200,302]vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines a healthy node.
successes
integer
default:
2vaild vaule:
between 1 and 254 inclusive
Number of successful probes to define a healthy node.
unhealthy
object
Unhealthy check configurations.
interval
integer
default:
1Time interval of checking unhealthy nodes, in seconds.
http_statuses
array[integer]
default:
[429,404,500,501,502,503,504,505]vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines an unhealthy node.
http_failures
integer
default:
5vaild vaule:
between 1 and 254 inclusive
Number of HTTP failures to define an unhealthy node.
timeout
integer
default:
3vaild vaule:
between 1 and 254 inclusive
Number of probe timeouts to define an unhealthy node.
logging
object
Logging configurations.
summaries
boolean
default:
falseIf true, log request LLM model, duration, request and response tokens.
payloads
boolean
default:
falseIf true, log request and response payload.
timeout
integer
default:
30000vaild vaule:
greater than or equal to 1
Request timeout in milliseconds when requesting the LLM service.
keepalive
boolean
default:
trueIf true, keep the conneciton alive when requesting the LLM service.
keepalive_timeout
integer
default:
60000vaild vaule:
greater than or equal to 1000
Request timeout in milliseconds when requesting the LLM service.
keepalive_pool
integer
default:
30Keepalive pool size for when connecting with the LLM service.
ssl_verify
boolean
default:
trueIf true, verify the LLM service's certificate.