Parameters
See plugin common configurations for configuration options available to all plugins.
fallback_strategy
string
default:
instance_health_and_rate_limiting
vaild vaule:
instance_health_and_rate_limiting
Fallback strategy. When set, the plugin will check whether the specified instance’s token has been exhausted when a request is forwarded. If so, forward the request to the next instance regardless of the instance priority. When not set, the plugin will not forward the request the request to low priority instances when token of the high priority instance is exhausted.
balancer
object
Load balancing configurations.
algorithm
string
default:
roundrobin
vaild vaule:
roundrobin
orchash
Load balancing algorithm. When set to
roundrobin
, weighted round robin algorithm is used. When set tochash
, consistent hashing algorithm is used.hash_on
string
vaild vaule:
vars
,headers
,cookie
,consumer
, orvars_combinations
Used when
type
ischash
. Support hashing on built-in variables, headers, cookie, consumer, or a combination of built-in variables.key
string
Used when
type
ischash
. Whenhash_on
is set toheader
orcookie
,key
is required. Whenhash_on
is set toconsumer
,key
is not required as the consumer name will be used as the key automatically.
instances
array[object]
required
LLM instance configurations.
name
string
required
Name of the LLM service instance.
provider
string
required
vaild vaule:
openai
,deepseek
,openai-compatible
LLM service provider. When set to
openai
, the plugin will proxy the request toapi.openai.com
. When set todeepseek
, the plugin will proxy the request toapi.deepseek.com
. When set toopenai-compatible
, the plugin will proxy the request to the custom endpoint configured inoverride
.priority
integer
default:
0
Priority of the LLM instance in load balancing.
priority
takes precedence overweight
.weight
string
required
default:
0
vaild vaule:
greater or equal to 0
Weight of the LLM instance in load balancing.
auth
object
required
Authentication configurations.
header
object
Authentication headers. At least one of the
header
andquery
should be configured.query
object
Authentication query parameters. At least one of the
header
andquery
should be configured.
options
object
Model configurations.
In addition to
model
, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI or DeepSeek, you can configure additional parameters such asmax_tokens
,temperature
,top_p
, andstream
. See your LLM provider's API documentation for more available options.model
string
Name of the LLM model, such as
gpt-4
orgpt-3.5
. See your LLM provider's API documentation for more available models.
logging
object
Logging configurations.
summaries
boolean
default:
false
If true, log request LLM model, duration, request and response tokens.
payloads
boolean
default:
false
If true, log request and response payload.
override
object
Override setting.
endpoint
string
LLM provider endpoint to replace the default endpoint with. If not configured, the plugin uses the default OpenAI endpoint
https://api.openai.com/v1/chat/completions
.
checks
object
Health check configurations.
Note that at the moment, OpenAI and DeepSeek do not provide an official health check endpoint. Other LLM services that you can configure under
openai-compatible
provider may have available health check endpoints.active
object
required
Active health check configurations.
type
string
default:
http
vaild vaule:
http
,https
, ortcp
Type of health check connection.
timeout
number
default:
1
Health check timeout in seconds.
concurrency
integer
default:
10
Number of upstream nodes to be checked at the same time.
host
string
HTTP host.
port
integer
vaild vaule:
between 1 and 65535 inclusive
HTTP port.
http_path
string
default:
/
vaild vaule:
between 1 and 65535 inclusive
Path for HTTP probing requests.
https_verify_certificate
boolean
default:
true
If true, verify the node's TLS certificate.
healthy
object
Healthy check configurations.
interval
integer
default:
1
Time interval of checking healthy nodes, in seconds.
http_statuses
array[integer]
default:
[200,302]
vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines a healthy node.
successes
integer
default:
2
vaild vaule:
between 1 and 254 inclusive
Number of successful probes to define a healthy node.
unhealthy
object
Unhealthy check configurations.
interval
integer
default:
1
Time interval of checking unhealthy nodes, in seconds.
http_statuses
array[integer]
default:
[429,404,500,501,502,503,504,505]
vaild vaule:
status code between 200 and 599 inclusive
An array of HTTP status codes that defines an unhealthy node.
http_failures
integer
default:
5
vaild vaule:
between 1 and 254 inclusive
Number of HTTP failures to define an unhealthy node.
timeout
integer
default:
3
vaild vaule:
between 1 and 254 inclusive
Number of probe timeouts to define an unhealthy node.
timeout
integer
default:
30000
vaild vaule:
greater than or equal to 1
Request timeout in milliseconds when requesting the LLM service.
keepalive
boolean
default:
true
If true, keep the conneciton alive when requesting the LLM service.
keepalive_timeout
integer
default:
60000
vaild vaule:
greater than or equal to 1000
Request timeout in milliseconds when requesting the LLM service.
keepalive_pool
integer
default:
30
Keepalive pool size for when connecting with the LLM service.
ssl_verify
boolean
default:
true
If true, verify the LLM service's certificate.