Routing and Failover

In this guide, you will create a multi-target model that keeps one caller-facing model alias in front of multiple target models.

A multi-target model does not call a provider directly. It uses a routing block to point to target model aliases, and AISIX chooses one of those targets for each request.

Prerequisites

Before starting, prepare the following:

A self-hosted gateway with the admin and proxy listeners available.
The admin key from the gateway config.yaml.
Two target models that can serve traffic. If you have not created them yet, configure Provider Credentials and Model Aliases first.
A caller API key for verification. You can create one in Caller API Keys, or create one in this guide.

Choose a Strategy

Choose a strategy based on what the caller-facing alias should do:

Use failover to keep one primary target with backups.
Use round_robin to rotate requests across similar targets.
Use weighted to send more traffic to some targets than others.

AISIX retries and fails over on retryable upstream failures, such as 5xx responses, request timeouts, and transport errors. Most upstream 4xx responses are treated as caller-side problems and do not trigger failover, except 429 when retry_on_429 is enabled.

Create a Multi-Target Model

Set the admin key and the model aliases used in this guide:

export AISIX_ADMIN_KEY="admin-local-only-change-me"
export PRIMARY_MODEL="gpt-4o-primary"
export SECONDARY_MODEL="gpt-4o-secondary"
export ROUTING_MODEL="chat-prod"

Create a failover model that starts with the primary target and falls back to the secondary target:

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "'"${ROUTING_MODEL}"'",
    "routing": {
      "strategy": "failover",
      "targets": [
        {"model": "'"${PRIMARY_MODEL}"'"},
        {"model": "'"${SECONDARY_MODEL}"'"}
      ],
      "retries": 1,
      "max_fallbacks": 1,
      "retry_on_429": true
    }
  }'

You should see a response similar to the following:

{
  "id": "134c4b01-09e6-41b7-97c7-f4e9a608f4c2",
  "value": {
    "display_name": "chat-prod",
    "routing": {
      "strategy": "failover",
      "targets": [
        {
          "model": "gpt-4o-primary"
        },
        {
          "model": "gpt-4o-secondary"
        }
      ],
      "retries": 1,
      "max_fallbacks": 1,
      "retry_on_429": true
    }
  },
  "revision": 1
}

Copy the highlighted id if you plan to update, inspect, or delete this multi-target model later.

With this configuration, AISIX starts with gpt-4o-primary. If that target has a retryable failure, AISIX can retry it once and then fail over once to gpt-4o-secondary.

Allow Caller Access

The caller API key must be allowed to use the multi-target alias. If you already have a caller API key resource, update its allowlist to include chat-prod.

For a self-hosted check, create a caller API key that can call only chat-prod:

export AISIX_API_KEY="sk-routing-demo"

AISIX_API_KEY_HASH=$(printf '%s' "${AISIX_API_KEY}" | shasum -a 256 | awk '{print $1}')

curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "key_hash": "'"${AISIX_API_KEY_HASH}"'",
    "allowed_models": ["'"${ROUTING_MODEL}"'"]
  }'

You should see a response similar to the following:

{
  "id": "9b7f01fd-5f26-4657-82ef-605cc2f0ce21",
  "value": {
    "key_hash": "dd08e1fdcc327a5f15dedfba33172b5412b887d9d12ffc1076f77683b1ddbe3e",
    "allowed_models": [
      "chat-prod"
    ]
  },
  "revision": 1
}

Copy the highlighted id if you plan to update, rotate, or delete this caller API key later.

Verify Routing

Send a request to the multi-target alias:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${AISIX_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "'"${ROUTING_MODEL}"'",
    "messages": [
      {"role": "user", "content": "Hello from AISIX routing."}
    ]
  }'

A successful request returns a response similar to the following:

HTTP/1.1 200 OK
content-type: application/json
x-aisix-call-id: ***
x-aisix-served-by: gpt-4o-primary
server: AISIX/0.1.0

{
  "id": "chatcmpl-***",
  "object": "chat.completion",
  "created": **********,
  "model": "chat-prod",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today with AISIX routing?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 13,
    "total_tokens": 26
  }
}

The response body keeps the caller-facing model name. The x-aisix-served-by header shows which target model served the request.

Use this header when you need to confirm which target handled a routed request. It is not present on cache hits, single-target model responses, error responses, or every endpoint family.

Streaming requests can resolve multi-target aliases, but they do not fail over after a stream has started.

Tune Retry and Runtime Behavior

The example uses one retry on the primary target and one fallback to the secondary target.

Adjust the routing fields only when the traffic plan needs different behavior:

Set max_fallbacks: 0 when you want target selection without cross-target fallback.
Enable retry_on_429 when upstream rate-limit responses should participate in retry and failover.
Configure routing.on_all_filtered: "original_order" when attempting a target is preferable to returning 503 all_candidates_unavailable.

For streaming requests, a timeout before the first upstream chunk can fail over to the next target. After streaming bytes have started, a timeout ends the stream instead of failing over mid-response.

Use GET /admin/v1/models/status to inspect runtime status for target models.

For complete model request fields and response shapes, see the Admin API Reference.

Next Steps

You have now configured a multi-target model and allowed a caller API key to use it. Continue with Multi-Target Model Failover to test failover end to end.

Prerequisites​

Choose a Strategy​

Create a Multi-Target Model​

Allow Caller Access​

Verify Routing​

Tune Retry and Runtime Behavior​

Next Steps​