Skip to main content

Parameters

See plugin common configurations for configuration options available to all plugins.

  • fallback_strategy

    string or array


    vaild vaule:

    string: instance_health_and_rate_limiting, http_429, or http_5xx
    array: Any combination of rate_limiting, http_429, and http_5xx


    Fallback strategy. The option instance_health_and_rate_limiting is kept for backward compatibility and is functionally the same as rate_limiting.

    With rate_limiting or instance_health_and_rate_limiting, when the current instance's quota is exhausted, the request is forwarded to the next instance regardless of priority. With http_429, if an instance returns status code 429, the request is retried with other instances. With http_5xx, if an instance returns a 5xx status code, the request is retried with other instances. If all instances fail, the plugin returns the last error response code.

    When not set, the plugin will not forward the request to low priority instances when tokens of the high priority instance are exhausted.

  • balancer

    object


    Load balancing configurations.

    • algorithm

      string


      default: roundrobin


      vaild vaule:

      roundrobin or chash


      Load balancing algorithm. When set to roundrobin, weighted round robin algorithm is used. When set to chash, consistent hashing algorithm is used.

    • hash_on

      string


      vaild vaule:

      vars, headers, cookie, consumer, or vars_combinations


      Used when type is chash. Support hashing on built-in variables, headers, cookie, consumer, or a combination of built-in variables.

    • key

      string


      Used when type is chash. When hash_on is set to header or cookie, key is required. When hash_on is set to consumer, key is not required as the consumer name will be used as the key automatically.

  • instances

    array[object]


    required


    LLM instance configurations.

    • name

      string


      required


      Name of the LLM service instance.

    • provider

      string


      required


      vaild vaule:

      openai, deepseek, azure-openai, aimlapi, gemini, vertex-ai, anthropic, openrouter, openai-compatible


      LLM service provider.

      When set to openai, the plugin will proxy requests to https://api.openai.com/chat/completions.

      When set to deepseek, the plugin will proxy requests to https://api.deepseek.com/chat/completions.

      When set to gemini (available from APISIX 3.15.0 and Enterprise 3.9.2), the plugin will proxy requests to https://generativelanguage.googleapis.com/v1beta/openai/chat/completions. If you are proxying requests to an embedding model, you should configure the embedding model endpoint in the override.

      When set to vertex-ai (available from APISIX 3.15.0 and Enterprise 3.9.2), the plugin proxies requests to Google Cloud Vertex AI. For chat completions, the plugin will proxy requests to https://{region}-aiplatform.googleapis.com/v1beta1/projects/{project_id}/locations/{region}/endpoints/openapi/chat/completions. For embeddings, the plugin will proxy requests to https://{region}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{region}/publishers/google/models/{model}:predict. These require configuring provider_conf with project_id and region. Alternatively, you can configure override for a custom endpoint.

      When set to anthropic (available from APISIX 3.15.0 and Enterprise 3.9.2), the plugin will proxy requests to https://api.anthropic.com/v1/chat/completions.

      When set to openrouter (available from APISIX 3.15.0 and Enterprise 3.9.2), the plugin will proxy requests to https://openrouter.ai/api/v1/chat/completions.

      When set to aimlapi (available from APISIX 3.14.0 and Enterprise 3.8.17), the plugin uses the OpenAI-compatible driver and proxies the request to https://api.aimlapi.com/v1/chat/completions.

      When set to openai-compatible, the plugin proxies requests to the custom endpoint configured in override.

      When set to azure-openai, the plugin also proxies requests to the custom endpoint configured in override and additionally removes the model parameter from user requests.

    • priority

      integer


      default: 0


      Priority of the LLM instance in load balancing. priority takes precedence over weight.

    • weight

      string


      required


      default: 0


      vaild vaule:

      greater than or equal to 0


      Weight of the LLM instance in load balancing.

    • auth

      object


      required


      Authentication configurations.

      • header

        object


        Authentication headers. At least one of the header and query should be configured. You can configure additional custom headers that will be forwarded to the upstream LLM service.

      • query

        object


        Authentication query parameters. At least one of the header and query should be configured.

      • gcp

        object


        GCP service account authentication for Vertex AI. Available in API7 Enterprise from 3.9.2 and not in APISIX.

        • service_account_json

          string


          GCP service account JSON content used for authentication. This can be configured using this parameter or by setting the GCP_SERVICE_ACCOUNT environment variable.

        • max_ttl

          integer


          Maximum TTL for GCP access token caching, in seconds.

        • expire_early_secs

          integer


          default: 60


          Number of seconds to expire the access token before its actual expiration time. This prevents edge cases where tokens expire during active requests.

    • options

      object


      Model configurations.

      In addition to model, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI or DeepSeek, you can configure additional parameters such as max_tokens, temperature, top_p, and stream. See your LLM provider's API documentation for more available options.

      • model

        string


        Name of the LLM model, such as gpt-4 or gpt-3.5. See your LLM provider's API documentation for more available models.

    • provider_conf

      object


      Provider-specific configuration. When provider is vertex-ai, one of provider_conf or override should be configured.

      Available in API7 Enterprise from 3.9.2 and not in APISIX.

      • project_id

        string


        required


        Google Cloud Project ID for Vertex AI.

      • region

        string


        required


        Google Cloud Region for Vertex AI.

    • override

      object


      Override setting.

      • endpoint

        string


        LLM provider endpoint to replace the default endpoint with. If not configured, the plugin uses the default OpenAI endpoint https://api.openai.com/v1/chat/completions.

    • checks

      object


      Health check configurations.

      Note that at the moment, OpenAI and DeepSeek do not provide an official health check endpoint. Other LLM services that you can configure under openai-compatible provider may have available health check endpoints.

      • active

        object


        required


        Active health check configurations.

        • type

          string


          default: http


          vaild vaule:

          http, https, or tcp


          Type of health check connection.

        • timeout

          number


          default: 1


          Health check timeout in seconds.

        • concurrency

          integer


          default: 10


          Number of upstream nodes to be checked at the same time.

        • host

          string


          HTTP host.

        • port

          integer


          vaild vaule:

          between 1 and 65535 inclusive


          HTTP port.

        • http_path

          string


          default: /


          vaild vaule:

          between 1 and 65535 inclusive


          Path for HTTP probing requests.

        • https_verify_certificate

          boolean


          default: true


          If true, verify the node's TLS certificate.

        • healthy

          object


          Healthy check configurations.

          • interval

            integer


            default: 1


            Time interval of checking healthy nodes, in seconds.

          • http_statuses

            array[integer]


            default: [200,302]


            vaild vaule:

            status code between 200 and 599 inclusive


            An array of HTTP status codes that defines a healthy node.

          • successes

            integer


            default: 2


            vaild vaule:

            between 1 and 254 inclusive


            Number of successful probes to define a healthy node.

        • unhealthy

          object


          Unhealthy check configurations.

          • interval

            integer


            default: 1


            Time interval of checking unhealthy nodes, in seconds.

          • http_statuses

            array[integer]


            default: [429,404,500,501,502,503,504,505]


            vaild vaule:

            status code between 200 and 599 inclusive


            An array of HTTP status codes that defines an unhealthy node.

          • http_failures

            integer


            default: 5


            vaild vaule:

            between 1 and 254 inclusive


            Number of HTTP failures to define an unhealthy node.

          • timeout

            integer


            default: 3


            vaild vaule:

            between 1 and 254 inclusive


            Number of probe timeouts to define an unhealthy node.

  • logging

    object


    Logging configurations.

    • summaries

      boolean


      default: false


      If true, log request LLM model, duration, request and response tokens.

    • payloads

      boolean


      default: false


      If true, log request and response payload.

  • timeout

    integer


    default: 30000


    vaild vaule:

    greater than or equal to 1


    Request timeout in milliseconds when requesting the LLM service.

  • keepalive

    boolean


    default: true


    If true, keep the conneciton alive when requesting the LLM service.

  • keepalive_timeout

    integer


    default: 60000


    vaild vaule:

    greater than or equal to 1000


    Request timeout in milliseconds when requesting the LLM service.

  • keepalive_pool

    integer


    default: 30


    Keepalive pool size for when connecting with the LLM service.

  • ssl_verify

    boolean


    default: true


    If true, verify the LLM service's certificate.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation