Route Traffic to OpenAI
OpenAI provides access to models such as GPT-4o and o1 for chat completions, embeddings, and more. This guide shows how to route traffic to OpenAI through API7 Gateway using the ai-proxy plugin.
Prerequisites
-
Install Docker.
-
Install cURL to send requests to the services for validation.
-
Have a running API7 Enterprise Gateway instance.
-
Obtain the Admin API key. Save it to an environment variable:
export ADMIN_API_KEY=your-admin-api-key # replace with your API key -
Obtain the ID of the service you want to configure. Save it to an environment variable:
export SERVICE_ID=your-service-id # replace with your service ID
Obtain an OpenAI API Key
Create an OpenAI account and generate an API key. Save the key to an environment variable:
export OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx # replace with your API key
Configure OpenAI as a Provider
Create a route with the ai-proxy plugin:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "openai-route",
"service_id": "$SERVICE_ID",
"paths": ["/openai"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4o"
}
}
}
}'
❶ Set the provider to openai.
❷ Attach the OpenAI API key in the Authorization header.
❸ Set the default model to gpt-4o.
services:
- name: OpenAI Service
routes:
- uris:
- /openai
name: openai-route
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
❶ Set the provider to openai.
❷ Attach the OpenAI API key in the Authorization header.
❸ Set the default model to gpt-4o.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
Multi-Model Routing with OpenAI
Use the ai-proxy-multi plugin to distribute traffic across multiple OpenAI models. This example routes 80% of traffic to gpt-4o-mini (cost-effective) and 20% to gpt-4o (premium), with automatic failover on HTTP 429 and 5xx errors:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "openai-multi-route",
"service_id": "$SERVICE_ID",
"paths": ["/openai"],
"plugins": {
"ai-proxy-multi": {
"fallback_strategy": ["http_429", "http_5xx"],
"instances": [
{
"name": "gpt-4o-mini",
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4o-mini"
},
"weight": 8
},
{
"name": "gpt-4o",
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4o"
},
"weight": 2
}
]
}
}
}'
❶ fallback_strategy enables automatic failover. When an instance returns HTTP 429 (rate limited) or 5xx (server error), the gateway retries on another instance.
❷ Assign a weight of 8 so that roughly 80% of traffic goes to gpt-4o-mini.
❸ Assign a weight of 2 to the premium gpt-4o instance for the remaining 20%.
services:
- name: OpenAI Multi-Model Service
routes:
- uris:
- /openai
name: openai-multi-route
plugins:
ai-proxy-multi:
fallback_strategy:
- http_429
- http_5xx
instances:
- name: gpt-4o-mini
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o-mini
weight: 8
- name: gpt-4o
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
weight: 2
❶ fallback_strategy enables automatic failover. When an instance returns HTTP 429 (rate limited) or 5xx (server error), the gateway retries on another instance.
❷ Assign a weight of 8 so that roughly 80% of traffic goes to gpt-4o-mini.
❸ Assign a weight of 2 to the premium gpt-4o instance for the remaining 20%.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
For more routing strategies including failover and priority-based routing, see Multi-LLM Routing and Fallback.
Validate the Configuration
Send a chat completion request:
curl "http://127.0.0.1:9080/openai" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a computer scientist." },
{ "role": "user", "content": "Explain in one sentence what a Turing machine is." }
]
}'
You should receive a response similar to the following:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "gpt-4o-2024-08-06",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A Turing machine is an abstract mathematical model of computation that defines an idealized machine capable of simulating any algorithm."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 22,
"completion_tokens": 24,
"total_tokens": 46
}
}
To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.
Next Steps
You have learned how to route traffic to OpenAI through API7 Gateway. See the OpenAI API reference to learn more about available models and endpoints.
- Multi-LLM Routing and Fallback — Load balance across models and providers.
- Token Rate Limiting — Control costs with token budgets.
- AI Observability — Monitor token usage and latency.