Parameters
See plugin common configurations for configuration options available to all plugins.
fallback_strategy
string or array
vaild vaule:
string:
instance_health_and_rate_limiting
,http_429
, orhttp_5xx
array: Any combination ofrate_limiting
,http_429
, andhttp_5xx
Fallback strategy. The option
instance_health_and_rate_limiting
is kept for backward compatibility and is functionally the same asrate_limiting
.With
rate_limiting
orinstance_health_and_rate_limiting
, when the current instance's quota is exhausted, the request is forwarded to the next instance regardless of priority. Withhttp_429
, if an instance returns status code 429, the request is retried with other instances. Withhttp_5xx
, if an instance returns a 5xx status code, the request is retried with other instances. If all instances fail, the plugin returns the last error response code.When not set, the plugin will not forward the request to low priority instances when tokens of the high priority instance are exhausted.
balancer
object
Load balancing configurations.
algorithm
string
default:
roundrobin
vaild vaule:
roundrobin
orchash
Load balancing algorithm. When set to
roundrobin
, weighted round robin algorithm is used. When set tochash
, consistent hashing algorithm is used.hash_on
string
vaild vaule:
vars
,headers
,cookie
,consumer
, orvars_combinations
Used when
type
ischash
. Support hashing on built-in variables, headers, cookie, consumer, or a combination of built-in variables.key
string
Used when
type
ischash
. Whenhash_on
is set toheader
orcookie
,key
is required. Whenhash_on
is set toconsumer
,key
is not required as the consumer name will be used as the key automatically.
instances
array[object]
required
LLM instance configurations.
name
string
required
Name of the LLM service instance.
provider
string
required
vaild vaule:
openai
,deepseek
,openai-compatible
,azure-openai
, oraimlapi
LLM service provider.
When set to
openai
, the plugin will proxy requests tohttps://api.openai.com/chat/completions
.When set to
deepseek
, the plugin will proxy requests tohttps://api.deepseek.com/chat/completions
.When set to
aimlapi
, the plugin uses the OpenAI-compatible driver and proxies the request tohttps://api.aimlapi.com/v1/chat/completions
by default. Theaimlapi
option is currently available in APISIX and will be supported in API7 Enterprise soon.When set to
openai-compatible
, the plugin proxies requests to the custom endpoint configured inoverride
.When set to
azure-openai
, the plugin also proxies requests to the custom endpoint configured inoverride
and additionally removes themodel
parameter from user requests.priority
integer
default:
0
Priority of the LLM instance in load balancing.
priority
takes precedence overweight
.weight
string
required
default:
0
vaild vaule:
greater than or equal to 0
Weight of the LLM instance in load balancing.
auth
object
required
Authentication configurations.
header
object
Authentication headers. At least one of the
header
andquery
should be configured. You can configure additional custom headers that will be forwarded to the upstream LLM service.query
object
Authentication query parameters. At least one of the
header
andquery
should be configured.
options
object
Model configurations.
In addition to
model
, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI or DeepSeek, you can configure additional parameters such asmax_tokens
,temperature
,top_p
, andstream
. See your LLM provider's API documentation for more available options.model
string
Name of the LLM model, such as
gpt-4
orgpt-3.5
. See your LLM provider's API documentation for more available models.
override
object
Override setting.
endpoint
string
LLM provider endpoint to replace the default endpoint with. If not configured, the plugin uses the default OpenAI endpoint
https://api.openai.com/v1/chat/completions
.
checks
object
Health check configurations.
Note that at the moment, OpenAI and DeepSeek do not provide an official health check endpoint. Other LLM services that you can configure under
openai-compatible
provider may have available health check endpoints.active
object
required
Active health check configurations.
type
string
default:
http
vaild vaule:
http
,https
, ortcp
Type of health check connection.
timeout
number
default:
1
Health check timeout in seconds.
concurrency
integer
default:
10
Number of upstream nodes to be checked at the same time.
host
string
HTTP host.
port
integer
vaild vaule:
between 1 and 65535 inclusive
HTTP port.
http_path
string
default:
/
vaild vaule:
between 1 and 65535 inclusive
Path for HTTP probing requests.
https_verify_certificate
boolean
default:
true
If true, verify the node's TLS certificate.
healthy
object
Healthy check configurations.
interval
integer
default:
1
Time interval of checking healthy nodes, in seconds.
http_statuses
array[integer]
default:
[200,302]
vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines a healthy node.
successes
integer
default:
2
vaild vaule:
between 1 and 254 inclusive
Number of successful probes to define a healthy node.
unhealthy
object
Unhealthy check configurations.
interval
integer
default:
1
Time interval of checking unhealthy nodes, in seconds.
http_statuses
array[integer]
default:
[429,404,500,501,502,503,504,505]
vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines an unhealthy node.
http_failures
integer
default:
5
vaild vaule:
between 1 and 254 inclusive
Number of HTTP failures to define an unhealthy node.
timeout
integer
default:
3
vaild vaule:
between 1 and 254 inclusive
Number of probe timeouts to define an unhealthy node.
logging
object
Logging configurations.
summaries
boolean
default:
false
If true, log request LLM model, duration, request and response tokens.
payloads
boolean
default:
false
If true, log request and response payload.
timeout
integer
default:
30000
vaild vaule:
greater than or equal to 1
Request timeout in milliseconds when requesting the LLM service.
keepalive
boolean
default:
true
If true, keep the conneciton alive when requesting the LLM service.
keepalive_timeout
integer
default:
60000
vaild vaule:
greater than or equal to 1000
Request timeout in milliseconds when requesting the LLM service.
keepalive_pool
integer
default:
30
Keepalive pool size for when connecting with the LLM service.
ssl_verify
boolean
default:
true
If true, verify the LLM service's certificate.