Version: 3.10.x

Connect Any OpenAI-Compatible LLM

API7 AI Gateway supports any LLM API that follows the OpenAI chat completion format. Use the openai-compatible provider to connect to self-hosted models (vLLM, Ollama, LMStudio), niche providers (Together AI, Groq, Fireworks), or internal LLM services.

note

For providers with dedicated driver support (OpenAI, DeepSeek, Anthropic, Azure OpenAI, Gemini, Vertex AI, OpenRouter), use their specific provider type instead. Dedicated drivers handle provider-specific authentication and endpoint construction automatically.

Prerequisites

Install Docker.
Install cURL to send requests to the services for validation.
Have a running API7 Gateway instance.
Have an LLM endpoint that accepts OpenAI-compatible /v1/chat/completions requests.

Create a token from the Dashboard and save it to an environment variable:

export API_KEY=your-dashboard-token   # replace with your Dashboard token

Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
```
export SERVICE_ID=your-service-id         # replace with your service ID
```

Configure the AI Proxy for a Custom Provider

Create a route with the ai-proxy plugin. The override.endpoint field is required for the openai-compatible provider.

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "custom-llm-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/custom-llm"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai-compatible",
        "auth": {
          "header": {
            "Authorization": "Bearer your-api-key"
          }
        },
        "options": {
          "model": "your-model-name"
        },
        "override": {
          "endpoint": "https://your-llm-endpoint.example.com/v1/chat/completions"
        }
      }
    }
  }'

❶ Set the provider to openai-compatible.

❷ Attach any required authentication header for your provider.

❸ Set the model name as expected by your provider.

❹ Required. Specify the full endpoint URL of your LLM service.

adc.yaml
services:
  - name: Custom LLM Service
    routes:
      - uris:
          - /custom-llm
        name: custom-llm-route
        plugins:
          ai-proxy:
            provider: openai-compatible
            auth:
              header:
                Authorization: "Bearer your-api-key"
            options:
              model: your-model-name
            override:
              endpoint: https://your-llm-endpoint.example.com/v1/chat/completions

❶ Set the provider to openai-compatible.

❷ Attach any required authentication header for your provider.

❸ Set the model name as expected by your provider.

❹ Required. Specify the full endpoint URL of your LLM service.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

Example: Connect to a Self-Hosted vLLM Instance

vLLM provides an OpenAI-compatible API server for self-hosted models:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "vllm-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/vllm"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai-compatible",
        "auth": {
          "header": {
            "Authorization": "Bearer placeholder"
          }
        },
        "options": {
          "model": "meta-llama/Llama-3.1-8B-Instruct"
        },
        "override": {
          "endpoint": "http://vllm-server:8000/v1/chat/completions"
        }
      }
    }
  }'

adc.yaml
services:
  - name: vLLM Service
    routes:
      - uris:
          - /vllm
        name: vllm-route
        plugins:
          ai-proxy:
            provider: openai-compatible
            auth:
              header:
                Authorization: "Bearer placeholder"
            options:
              model: meta-llama/Llama-3.1-8B-Instruct
            override:
              endpoint: http://vllm-server:8000/v1/chat/completions

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

The auth field is always required in the configuration. When the vLLM server itself does not require authentication, a placeholder value is sufficient.

Example: Connect to Together AI

Together AI provides an OpenAI-compatible API for running open-source models:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "together-ai-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/together-ai"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai-compatible",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$TOGETHER_API_KEY"'"
          }
        },
        "options": {
          "model": "meta-llama/Llama-3.1-70B-Instruct-Turbo"
        },
        "override": {
          "endpoint": "https://api.together.xyz/v1/chat/completions"
        }
      }
    }
  }'

adc.yaml
services:
  - name: Together AI Service
    routes:
      - uris:
          - /together-ai
        name: together-ai-route
        plugins:
          ai-proxy:
            provider: openai-compatible
            auth:
              header:
                Authorization: "Bearer ${TOGETHER_API_KEY}"
            options:
              model: meta-llama/Llama-3.1-70B-Instruct-Turbo
            override:
              endpoint: https://api.together.xyz/v1/chat/completions

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

Multi-Model Routing

Use ai-proxy-multi to route traffic between a self-hosted model and a cloud provider for fallback:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "hybrid-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/hybrid"],
    "plugins": {
      "ai-proxy-multi": {
        "fallback_strategy": ["http_429", "http_5xx"],
        "instances": [
          {
            "name": "self-hosted",
            "provider": "openai-compatible",
            "auth": { "header": { "Authorization": "Bearer placeholder" } },
            "options": { "model": "meta-llama/Llama-3.1-8B-Instruct" },
            "override": { "endpoint": "http://vllm-server:8000/v1/chat/completions" },
            "weight": 1,
            "priority": 1
          },
          {
            "name": "cloud-fallback",
            "provider": "openai",
            "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
            "options": { "model": "gpt-4o-mini" },
            "weight": 1,
            "priority": 2
          }
        ]
      }
    }
  }'

❶ fallback_strategy enables automatic failover on HTTP 429 (rate limited) or 5xx (server error).

❷ Fallback: OpenAI as backup when the self-hosted instance is unavailable.

adc.yaml
services:
  - name: Hybrid LLM Service
    routes:
      - uris:
          - /hybrid
        name: hybrid-route
        plugins:
          ai-proxy-multi:
            fallback_strategy:
              - http_429
              - http_5xx
            instances:
              - name: self-hosted
                provider: openai-compatible
                auth:
                  header:
                    Authorization: "Bearer placeholder"
                options:
                  model: meta-llama/Llama-3.1-8B-Instruct
                override:
                  endpoint: http://vllm-server:8000/v1/chat/completions
                weight: 1
                priority: 1
              - name: cloud-fallback
                provider: openai
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4o-mini
                weight: 1
                priority: 2

❶ fallback_strategy enables automatic failover on HTTP 429 (rate limited) or 5xx (server error).

❷ Fallback: OpenAI as backup when the self-hosted instance is unavailable.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

For more routing strategies, see Multi-LLM Routing and Fallback.

Validate the Configuration

Send a chat completion request:

curl "http://127.0.0.1:9080/custom-llm" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "user", "content": "Hello, how are you?" }
    ]
  }'

You should receive a response in the standard OpenAI chat completion format, regardless of the backend provider.

To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Next Steps

You have learned how to connect any OpenAI-compatible LLM to API7 Gateway.

Multi-LLM Routing and Fallback — Combine custom providers with native ones for failover.
AI Observability — Monitor token usage across all providers.

Prerequisites​

Configure the AI Proxy for a Custom Provider​

Example: Connect to a Self-Hosted vLLM Instance​

Example: Connect to Together AI​

Multi-Model Routing​

Validate the Configuration​

Next Steps​

Prerequisites

Configure the AI Proxy for a Custom Provider

Example: Connect to a Self-Hosted vLLM Instance

Example: Connect to Together AI

Multi-Model Routing

Validate the Configuration

Next Steps