Connect Any OpenAI-Compatible LLM
API7 AI Gateway supports any LLM API that follows the OpenAI chat completion format. Use the openai-compatible provider to connect to self-hosted models (vLLM, Ollama, LMStudio), niche providers (Together AI, Groq, Fireworks), or internal LLM services.
For providers with dedicated driver support (OpenAI, DeepSeek, Anthropic, Azure OpenAI, Gemini, Vertex AI, OpenRouter), use their specific provider type instead. Dedicated drivers handle provider-specific authentication and endpoint construction automatically.
Prerequisites
-
Install Docker.
-
Install cURL to send requests to the services for validation.
-
Have a running API7 Enterprise Gateway instance.
-
Have an LLM endpoint that accepts OpenAI-compatible
/v1/chat/completionsrequests. -
Obtain the Admin API key. Save it to an environment variable:
export ADMIN_API_KEY=your-admin-api-key # replace with your API key -
Obtain the ID of the service you want to configure. Save it to an environment variable:
export SERVICE_ID=your-service-id # replace with your service ID
Configure the AI Proxy for a Custom Provider
Create a route with the ai-proxy plugin. The override.endpoint field is required for the openai-compatible provider.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "custom-llm-route",
"service_id": "$SERVICE_ID",
"paths": ["/custom-llm"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer your-api-key"
}
},
"options": {
"model": "your-model-name"
},
"override": {
"endpoint": "https://your-llm-endpoint.example.com/v1/chat/completions"
}
}
}
}'
❶ Set the provider to openai-compatible.
❷ Attach any required authentication header for your provider.
❸ Set the model name as expected by your provider.
❹ Required. Specify the full endpoint URL of your LLM service.
services:
- name: Custom LLM Service
routes:
- uris:
- /custom-llm
name: custom-llm-route
plugins:
ai-proxy:
provider: openai-compatible
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: your-model-name
override:
endpoint: https://your-llm-endpoint.example.com/v1/chat/completions
❶ Set the provider to openai-compatible.
❷ Attach any required authentication header for your provider.
❸ Set the model name as expected by your provider.
❹ Required. Specify the full endpoint URL of your LLM service.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
Example: Connect to a Self-Hosted vLLM Instance
vLLM provides an OpenAI-compatible API server for self-hosted models:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "vllm-route",
"service_id": "$SERVICE_ID",
"paths": ["/vllm"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer placeholder"
}
},
"options": {
"model": "meta-llama/Llama-3.1-8B-Instruct"
},
"override": {
"endpoint": "http://vllm-server:8000/v1/chat/completions"
}
}
}
}'
services:
- name: vLLM Service
routes:
- uris:
- /vllm
name: vllm-route
plugins:
ai-proxy:
provider: openai-compatible
auth:
header:
Authorization: "Bearer placeholder"
options:
model: meta-llama/Llama-3.1-8B-Instruct
override:
endpoint: http://vllm-server:8000/v1/chat/completions
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
The auth field is always required in the configuration. When the vLLM server itself does not require authentication, a placeholder value is sufficient.
Example: Connect to Together AI
Together AI provides an OpenAI-compatible API for running open-source models:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "together-ai-route",
"service_id": "$SERVICE_ID",
"paths": ["/together-ai"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer '"$TOGETHER_API_KEY"'"
}
},
"options": {
"model": "meta-llama/Llama-3.1-70B-Instruct-Turbo"
},
"override": {
"endpoint": "https://api.together.xyz/v1/chat/completions"
}
}
}
}'
services:
- name: Together AI Service
routes:
- uris:
- /together-ai
name: together-ai-route
plugins:
ai-proxy:
provider: openai-compatible
auth:
header:
Authorization: "Bearer your-together-api-key"
options:
model: meta-llama/Llama-3.1-70B-Instruct-Turbo
override:
endpoint: https://api.together.xyz/v1/chat/completions
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
Multi-Model Routing
Use ai-proxy-multi to route traffic between a self-hosted model and a cloud provider for fallback:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "hybrid-route",
"service_id": "$SERVICE_ID",
"paths": ["/hybrid"],
"plugins": {
"ai-proxy-multi": {
"fallback_strategy": ["http_429", "http_5xx"],
"instances": [
{
"name": "self-hosted",
"provider": "openai-compatible",
"auth": { "header": { "Authorization": "Bearer placeholder" } },
"options": { "model": "meta-llama/Llama-3.1-8B-Instruct" },
"override": { "endpoint": "http://vllm-server:8000/v1/chat/completions" },
"weight": 1,
"priority": 1
},
{
"name": "cloud-fallback",
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o-mini" },
"weight": 1,
"priority": 2
}
]
}
}
}'
❶ fallback_strategy enables automatic failover on HTTP 429 (rate limited) or 5xx (server error).
❷ Fallback: OpenAI as backup when the self-hosted instance is unavailable.
services:
- name: Hybrid LLM Service
routes:
- uris:
- /hybrid
name: hybrid-route
plugins:
ai-proxy-multi:
fallback_strategy:
- http_429
- http_5xx
instances:
- name: self-hosted
provider: openai-compatible
auth:
header:
Authorization: "Bearer placeholder"
options:
model: meta-llama/Llama-3.1-8B-Instruct
override:
endpoint: http://vllm-server:8000/v1/chat/completions
weight: 1
priority: 1
- name: cloud-fallback
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o-mini
weight: 1
priority: 2
❶ fallback_strategy enables automatic failover on HTTP 429 (rate limited) or 5xx (server error).
❷ Fallback: OpenAI as backup when the self-hosted instance is unavailable.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
For more routing strategies, see Multi-LLM Routing and Fallback.
Validate the Configuration
Send a chat completion request:
curl "http://127.0.0.1:9080/custom-llm" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Hello, how are you?" }
]
}'
You should receive a response in the standard OpenAI chat completion format, regardless of the backend provider.
To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.
Next Steps
You have learned how to connect any OpenAI-compatible LLM to API7 Gateway.
- Multi-LLM Routing and Fallback — Combine custom providers with native ones for failover.
- AI Observability — Monitor token usage across all providers.