AI Proxy Multi | APISIX & API7 API Gateway Docs

Parameters

See plugin common configurations for configuration options available to all plugins.

fallback_strategy
string or array
vaild vaule:
string: instance_health_and_rate_limiting, http_429, or http_5xx
array: Any combination of rate_limiting, http_429 and http_5xx
Fallback strategy.
If you are using APISIX, fallback_strategy accepts a string. The default and the only configurable value is instance_health_and_rate_limiting. When configured, the plugin will check whether a instance’s rate limiting quota has been exhausted when forwarding a request, and if so, forward the request to the next instance regardless of the instance priority. When not configured, the plugin will not forward the request the request to low priority instances when rate limiting quota of the high priority instance is exhausted.
If you are using API7 Enterprise, fallback_strategy accepts a string or an array. The option of instance_health_and_rate_limiting is maintained for backward compatibility; functionally rate_limiting and instance_health_and_rate_limiting are the same. http_429 means that when an LLM instance returns a 429 status code, the plugin will retry the request with other instances. http_5xx means that when an LLM instance returns a status code in the 500 range, the plugin will retry with other instances. If all instances are retried without success, the plugin returns the error response code.
balancer
object
Load balancing configurations.
- algorithm
  string
  default: roundrobin
  vaild vaule:
  roundrobin or chash
  Load balancing algorithm. When set to roundrobin, weighted round robin algorithm is used. When set to chash, consistent hashing algorithm is used.
- hash_on
  string
  vaild vaule:
  vars, headers, cookie, consumer, or vars_combinations
  Used when type is chash. Support hashing on built-in variables, headers, cookie, consumer, or a combination of built-in variables.
- key
  string
  Used when type is chash. When hash_on is set to header or cookie, key is required. When hash_on is set to consumer, key is not required as the consumer name will be used as the key automatically.
instances
array[object]
required
LLM instance configurations.
- name
  string
  required
  Name of the LLM service instance.
- provider
  string
  required
  vaild vaule:
  openai, deepseek, openai-compatible, or azure-openai
  LLM service provider. When set to openai, the plugin will proxy requests to api.openai.com. When set to deepseek, the plugin will proxy requests to api.deepseek.com. When set to openai-compatible, the plugin proxies requests to the custom endpoint configured in override. When set to azure-openai (available in API7 Enterprise only), the plugin also proxies requests to the custom endpoint configured in override and additionally removes the model parameter from user requests.
- priority
  integer
  default: 0
  Priority of the LLM instance in load balancing. priority takes precedence over weight.
- weight
  string
  required
  default: 0
  vaild vaule:
  greater than or equal to 0
  Weight of the LLM instance in load balancing.
- auth
  object
  required
  Authentication configurations.
  header
  object
  Authentication headers. At least one of the header and query should be configured.
  query
  object
  Authentication query parameters. At least one of the header and query should be configured.
- options
  object
  Model configurations.
  In addition to model, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI or DeepSeek, you can configure additional parameters such as max_tokens, temperature, top_p, and stream. See your LLM provider's API documentation for more available options.
  model
  string
  Name of the LLM model, such as gpt-4 or gpt-3.5. See your LLM provider's API documentation for more available models.
- override
  object
  Override setting.
  endpoint
  string
  LLM provider endpoint to replace the default endpoint with. If not configured, the plugin uses the default OpenAI endpoint https://api.openai.com/v1/chat/completions.
- checks
  object
  Health check configurations.
  Note that at the moment, OpenAI and DeepSeek do not provide an official health check endpoint. Other LLM services that you can configure under openai-compatible provider may have available health check endpoints.
  active
  object
  required
  Active health check configurations.
  type
  string
  default: http
  vaild vaule:
  http, https, or tcp
  Type of health check connection.
  timeout
  number
  default: 1
  Health check timeout in seconds.
  concurrency
  integer
  default: 10
  Number of upstream nodes to be checked at the same time.
  host
  string
  HTTP host.
  port
  integer
  vaild vaule:
  between 1 and 65535 inclusive
  HTTP port.
  http_path
  string
  default: /
  vaild vaule:
  between 1 and 65535 inclusive
  Path for HTTP probing requests.
  https_verify_certificate
  boolean
  default: true
  If true, verify the node's TLS certificate.
  healthy
  object
  Healthy check configurations.
  interval
  integer
  default: 1
  Time interval of checking healthy nodes, in seconds.
  http_statuses
  array[integer]
  default: [200,302]
  vaild vaule:
  status code between 200 and 599 inclusive
  An array of HTTP status codes that defines a healthy node.
  successes
  integer
  default: 2
  vaild vaule:
  between 1 and 254 inclusive
  Number of successful probes to define a healthy node.
  unhealthy
  object
  Unhealthy check configurations.
  interval
  integer
  default: 1
  Time interval of checking unhealthy nodes, in seconds.
  http_statuses
  array[integer]
  default: [429,404,500,501,502,503,504,505]
  vaild vaule:
  status code between 200 and 599 inclusive
  An array of HTTP status codes that defines an unhealthy node.
  http_failures
  integer
  default: 5
  vaild vaule:
  between 1 and 254 inclusive
  Number of HTTP failures to define an unhealthy node.
  timeout
  integer
  default: 3
  vaild vaule:
  between 1 and 254 inclusive
  Number of probe timeouts to define an unhealthy node.
logging
object
Logging configurations.
- summaries
  boolean
  default: false
  If true, log request LLM model, duration, request and response tokens.
- payloads
  boolean
  default: false
  If true, log request and response payload.
timeout
integer
default: 30000
vaild vaule:
greater than or equal to 1
Request timeout in milliseconds when requesting the LLM service.
keepalive
boolean
default: true
If true, keep the conneciton alive when requesting the LLM service.
keepalive_timeout
integer
default: 60000
vaild vaule:
greater than or equal to 1000
Request timeout in milliseconds when requesting the LLM service.
keepalive_pool
integer
default: 30
Keepalive pool size for when connecting with the LLM service.
ssl_verify
boolean
default: true
If true, verify the LLM service's certificate.

Parameters​

Parameters