Version: 3.10.x

Route Traffic to OpenAI

OpenAI provides access to models such as GPT-4o and o1 for chat completions, embeddings, and more. This guide shows how to route traffic to OpenAI through API7 Gateway using the ai-proxy plugin.

Prerequisites

Install Docker.
Install cURL to send requests to the services for validation.
Have a running API7 Gateway instance.

Create a token from the Dashboard and save it to an environment variable:

export API_KEY=your-dashboard-token   # replace with your Dashboard token

Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
```
export SERVICE_ID=your-service-id         # replace with your service ID
```

Obtain an OpenAI API Key

Create an OpenAI account and generate an API key. Save the key to an environment variable:

export OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx   # replace with your OpenAI API key

Configure OpenAI as a Provider

Create a route with the ai-proxy plugin:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "openai-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/openai"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4o"
        }
      }
    }
  }'

❶ Set the provider to openai.

❷ Attach the OpenAI API key in the Authorization header.

❸ Set the default model to gpt-4o.

adc.yaml
services:
  - name: OpenAI Service
    routes:
      - uris:
          - /openai
        name: openai-route
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o

❶ Set the provider to openai.

❷ Attach the OpenAI API key in the Authorization header.

❸ Set the default model to gpt-4o.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

Multi-Model Routing with OpenAI

Use the ai-proxy-multi plugin to distribute traffic across multiple OpenAI models. This example routes 80% of traffic to gpt-4o-mini (cost-effective) and 20% to gpt-4o (premium), with automatic failover on HTTP 429 and 5xx errors:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "openai-multi-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/openai"],
    "plugins": {
      "ai-proxy-multi": {
        "fallback_strategy": ["http_429", "http_5xx"],
        "instances": [
          {
            "name": "gpt-4o-mini",
            "provider": "openai",
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4o-mini"
            },
            "weight": 8
          },
          {
            "name": "gpt-4o",
            "provider": "openai",
            "auth": {
              "header": {
                "Authorization": "Bearer '"$OPENAI_API_KEY"'"
              }
            },
            "options": {
              "model": "gpt-4o"
            },
            "weight": 2
          }
        ]
      }
    }
  }'

❶ fallback_strategy enables automatic failover. When an instance returns HTTP 429 (rate limited) or 5xx (server error), the gateway retries on another instance.

❷ Assign a weight of 8 so that roughly 80% of traffic goes to gpt-4o-mini.

❸ Assign a weight of 2 to the premium gpt-4o instance for the remaining 20%.

adc.yaml
services:
  - name: OpenAI Multi-Model Service
    routes:
      - uris:
          - /openai
        name: openai-multi-route
        plugins:
          ai-proxy-multi:
            fallback_strategy:
              - http_429
              - http_5xx
            instances:
              - name: gpt-4o-mini
                provider: openai
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4o-mini
                weight: 8
              - name: gpt-4o
                provider: openai
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4o
                weight: 2

❶ fallback_strategy enables automatic failover. When an instance returns HTTP 429 (rate limited) or 5xx (server error), the gateway retries on another instance.

❷ Assign a weight of 8 so that roughly 80% of traffic goes to gpt-4o-mini.

❸ Assign a weight of 2 to the premium gpt-4o instance for the remaining 20%.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

For more routing strategies including failover and priority-based routing, see Multi-LLM Routing and Fallback.

Validate the Configuration

Send a chat completion request:

curl "http://127.0.0.1:9080/openai" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a computer scientist." },
      { "role": "user", "content": "Explain in one sentence what a Turing machine is." }
    ]
  }'

You should receive a response similar to the following:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A Turing machine is an abstract mathematical model of computation that defines an idealized machine capable of simulating any algorithm."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 24,
    "total_tokens": 46
  }
}

To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Next Steps

You have learned how to route traffic to OpenAI through API7 Gateway. See the OpenAI API reference to learn more about available models and endpoints.

Multi-LLM Routing and Fallback — Load balance across models and providers.
Token Rate Limiting — Control costs with token budgets.
AI Observability — Monitor token usage and latency.

Prerequisites​

Obtain an OpenAI API Key​

Configure OpenAI as a Provider​

Multi-Model Routing with OpenAI​

Validate the Configuration​

Next Steps​