Skip to main content

Version: latest

Connect Any OpenAI-Compatible LLM

API7 AI Gateway supports any LLM API that follows the OpenAI chat completion format. Use the openai-compatible provider to connect to self-hosted models (vLLM, Ollama, LMStudio), niche providers (Together AI, Groq, Fireworks), or internal LLM services.

note

For providers with dedicated driver support (OpenAI, DeepSeek, Anthropic, Azure OpenAI, Gemini, Vertex AI, OpenRouter), use their specific provider type instead. Dedicated drivers handle provider-specific authentication and endpoint construction automatically.

Prerequisites

  • Install Docker.

  • Install cURL to send requests to the services for validation.

  • Have a running API7 Enterprise Gateway instance.

  • Have an LLM endpoint that accepts OpenAI-compatible /v1/chat/completions requests.

  • Obtain the Admin API key. Save it to an environment variable:

    export ADMIN_API_KEY=your-admin-api-key   # replace with your API key
  • Obtain the ID of the service you want to configure. Save it to an environment variable:

    export SERVICE_ID=your-service-id         # replace with your service ID

Configure the AI Proxy for a Custom Provider

Create a route with the ai-proxy plugin. The override.endpoint field is required for the openai-compatible provider.

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "custom-llm-route",
"service_id": "$SERVICE_ID",
"paths": ["/custom-llm"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer your-api-key"
}
},
"options": {
"model": "your-model-name"
},
"override": {
"endpoint": "https://your-llm-endpoint.example.com/v1/chat/completions"
}
}
}
}'

❶ Set the provider to openai-compatible.

❷ Attach any required authentication header for your provider.

❸ Set the model name as expected by your provider.

Required. Specify the full endpoint URL of your LLM service.

Example: Connect to a Self-Hosted vLLM Instance

vLLM provides an OpenAI-compatible API server for self-hosted models:

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "vllm-route",
"service_id": "$SERVICE_ID",
"paths": ["/vllm"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer placeholder"
}
},
"options": {
"model": "meta-llama/Llama-3.1-8B-Instruct"
},
"override": {
"endpoint": "http://vllm-server:8000/v1/chat/completions"
}
}
}
}'

The auth field is always required in the configuration. When the vLLM server itself does not require authentication, a placeholder value is sufficient.

Example: Connect to Together AI

Together AI provides an OpenAI-compatible API for running open-source models:

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "together-ai-route",
"service_id": "$SERVICE_ID",
"paths": ["/together-ai"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"Authorization": "Bearer '"$TOGETHER_API_KEY"'"
}
},
"options": {
"model": "meta-llama/Llama-3.1-70B-Instruct-Turbo"
},
"override": {
"endpoint": "https://api.together.xyz/v1/chat/completions"
}
}
}
}'

Multi-Model Routing

Use ai-proxy-multi to route traffic between a self-hosted model and a cloud provider for fallback:

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "hybrid-route",
"service_id": "$SERVICE_ID",
"paths": ["/hybrid"],
"plugins": {
"ai-proxy-multi": {
"fallback_strategy": ["http_429", "http_5xx"],
"instances": [
{
"name": "self-hosted",
"provider": "openai-compatible",
"auth": { "header": { "Authorization": "Bearer placeholder" } },
"options": { "model": "meta-llama/Llama-3.1-8B-Instruct" },
"override": { "endpoint": "http://vllm-server:8000/v1/chat/completions" },
"weight": 1,
"priority": 1
},
{
"name": "cloud-fallback",
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o-mini" },
"weight": 1,
"priority": 2
}
]
}
}
}'

fallback_strategy enables automatic failover on HTTP 429 (rate limited) or 5xx (server error).

❷ Fallback: OpenAI as backup when the self-hosted instance is unavailable.

For more routing strategies, see Multi-LLM Routing and Fallback.

Validate the Configuration

Send a chat completion request:

curl "http://127.0.0.1:9080/custom-llm" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Hello, how are you?" }
]
}'

You should receive a response in the standard OpenAI chat completion format, regardless of the backend provider.

To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Next Steps

You have learned how to connect any OpenAI-compatible LLM to API7 Gateway.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation