Bring Your Own Endpoint

In this guide, you will connect AISIX AI Gateway to a private OpenAI-compatible endpoint, such as a vLLM or SGLang inference server, an Ollama host, or a self-hosted proxy in front of your own models.

Use a BYO endpoint when applications should keep calling AISIX with the OpenAI-compatible API while AISIX forwards traffic to a private or air-gapped model service. The endpoint must accept OpenAI-compatible chat-completions requests.

Prerequisites

Before starting, prepare the following:

A gateway with admin on :3001 and proxy on :3000.
The admin key from the gateway config.yaml.
A reachable OpenAI-compatible endpoint. The examples below assume vLLM at http://10.0.0.5:8000/v1 serving meta-llama/Llama-3.1-8B-Instruct.
The endpoint root your server expects, such as http://host:8000/v1 for vLLM, http://host:30000/v1 for SGLang, or http://host:11434/v1 for Ollama.

Configure the BYO Endpoint

Create a provider key, model alias, and caller API key for the private endpoint.

Create a Provider Key

Many self-hosted inference servers do not require an API key. For an unauthenticated endpoint, use a non-empty placeholder in the provider key; AISIX sends it as the bearer token, and your server can ignore it.

Create a provider key for the private OpenAI-compatible endpoint:

export AISIX_ADMIN_KEY="YOUR_ADMIN_KEY"

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/provider_keys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "vllm-private",
    "provider": "vllm",
    "adapter": "openai",
    "secret": "not-used-by-vllm",
    "api_base": "http://10.0.0.5:8000/v1"
  }'

Provider key secrets follow the credential-handling behavior described in Provider Credentials.

❶ provider is any short label that makes sense for your environment.

❷ adapter selects the OpenAI-compatible upstream format.

❸ secret is a non-empty placeholder for unauthenticated endpoints.

❹ api_base is the endpoint root. Include /v1 when that is part of the server's route.

Save the returned provider key ID for the model resource.

Create a Model

Map a caller-facing alias to the upstream model ID your endpoint serves:

export PROVIDER_KEY_ID="YOUR_PROVIDER_KEY_ID"

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "llama-3-private",
    "provider": "vllm",
    "model_name": "meta-llama/Llama-3.1-8B-Instruct",
    "provider_key_id": "'"${PROVIDER_KEY_ID}"'",
    "cost": {
      "input_per_1k": 0.0,
      "output_per_1k": 0.0
    }
  }'

❶ display_name is the alias callers send in model.

❷ model_name is the upstream ID your endpoint expects.

❸ provider_key_id attaches the model alias to the provider key you created.

❹ cost is optional. For vLLM and SGLang, use the served model name. For Ollama, use the local model tag, such as llama3.1:8b.

Create a Caller API Key

Choose the caller API key value that the application will send to AISIX, then hash it for the admin resource:

export AISIX_API_KEY="YOUR_CALLER_API_KEY"

CALLER_KEY_HASH=$(printf '%s' "${AISIX_API_KEY}" | shasum -a 256 | awk '{print $1}')

Create an API key resource with access to the private model alias:

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "key_hash": "'"${CALLER_KEY_HASH}"'",
    "allowed_models": ["llama-3-private"]
  }'

❶ allowed_models must match the model alias you created.

Pricing Metadata

Catalog providers carry pricing from the models.dev catalog. A BYO endpoint is not in that catalog, so set pricing metadata yourself if you need token-cost accounting.

Attach a cost block to the model to enable per-token budget accounting:

{
  "cost": {
    "input_per_1k": 0.10,
    "output_per_1k": 0.30
  }
}

Both values are in USD per 1,000 tokens. input_per_1k applies to prompt tokens and output_per_1k to completion tokens. Both fields are required when the cost block is present.

Self-hosted deployments store this metadata but do not enforce budget checks from it at request time. Include it on a BYO model so a managed deployment, or your own usage-event consumer, has the per-token rate available. See Model Aliases and Budgets.

Verify the Upstream

Send a request through the proxy with the caller API key and model alias you created:

curl -sS -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${AISIX_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3-private",
    "messages": [
      {"role": "user", "content": "Say hello from the private model."}
    ]
  }'

The response should be an OpenAI-compatible chat-completions response that echoes the caller-facing alias. Check the endpoint access log for a POST /v1/chat/completions entry from AISIX.

If AISIX returns an upstream route or connection error, check api_base, the served model name, and endpoint reachability.

Next Steps

You have now connected a private OpenAI-compatible endpoint to AISIX. Use the same pattern for other private OpenAI-compatible servers by changing the provider label, endpoint root, model ID, and optional pricing metadata.

Prerequisites​

Configure the BYO Endpoint​

Create a Provider Key​

Create a Model​

Create a Caller API Key​

Pricing Metadata​

Verify the Upstream​

Next Steps​