Routing and Failover
In this guide, you will create a multi-target model that keeps one caller-facing model alias in front of multiple target models.
A multi-target model does not call a provider directly. It uses a routing block to point to target model aliases, and AISIX chooses one of those targets for each request.
Prerequisites
Before starting, prepare the following:
- A self-hosted gateway with the admin and proxy listeners available.
- The admin key from the gateway
config.yaml. - Two target models that can serve traffic. If you have not created them yet, configure Provider Credentials and Model Aliases first.
- A caller API key for verification. You can create one in Caller API Keys, or create one in this guide.
Choose a Strategy
Choose a strategy based on what the caller-facing alias should do:
- Use
failoverto keep one primary target with backups. - Use
round_robinto rotate requests across similar targets. - Use
weightedto send more traffic to some targets than others.
AISIX retries and fails over on retryable upstream failures, such as 5xx responses, request timeouts, and transport errors. Most upstream 4xx responses are treated as caller-side problems and do not trigger failover, except 429 when retry_on_429 is enabled.
Create a Multi-Target Model
Set the admin key and the model aliases used in this guide:
export AISIX_ADMIN_KEY="admin-local-only-change-me"
export PRIMARY_MODEL="gpt-4o-primary"
export SECONDARY_MODEL="gpt-4o-secondary"
export ROUTING_MODEL="chat-prod"
Create a failover model that starts with the primary target and falls back to the secondary target:
curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "'"${ROUTING_MODEL}"'",
"routing": {
"strategy": "failover",
"targets": [
{"model": "'"${PRIMARY_MODEL}"'"},
{"model": "'"${SECONDARY_MODEL}"'"}
],
"retries": 1,
"max_fallbacks": 1,
"retry_on_429": true
}
}'
You should see a response similar to the following:
{
"id": "134c4b01-09e6-41b7-97c7-f4e9a608f4c2",
"value": {
"display_name": "chat-prod",
"routing": {
"strategy": "failover",
"targets": [
{
"model": "gpt-4o-primary"
},
{
"model": "gpt-4o-secondary"
}
],
"retries": 1,
"max_fallbacks": 1,
"retry_on_429": true
}
},
"revision": 1
}
Copy the highlighted id if you plan to update, inspect, or delete this multi-target model later.
With this configuration, AISIX starts with gpt-4o-primary. If that target has a retryable failure, AISIX can retry it once and then fail over once to gpt-4o-secondary.
Allow Caller Access
The caller API key must be allowed to use the multi-target alias. If you already have a caller API key resource, update its allowlist to include chat-prod.
For a self-hosted check, create a caller API key that can call only chat-prod:
export AISIX_API_KEY="sk-routing-demo"
AISIX_API_KEY_HASH=$(printf '%s' "${AISIX_API_KEY}" | shasum -a 256 | awk '{print $1}')
curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"key_hash": "'"${AISIX_API_KEY_HASH}"'",
"allowed_models": ["'"${ROUTING_MODEL}"'"]
}'
You should see a response similar to the following:
{
"id": "9b7f01fd-5f26-4657-82ef-605cc2f0ce21",
"value": {
"key_hash": "dd08e1fdcc327a5f15dedfba33172b5412b887d9d12ffc1076f77683b1ddbe3e",
"allowed_models": [
"chat-prod"
]
},
"revision": 1
}
Copy the highlighted id if you plan to update, rotate, or delete this caller API key later.
Verify Routing
Send a request to the multi-target alias:
curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "'"${ROUTING_MODEL}"'",
"messages": [
{"role": "user", "content": "Hello from AISIX routing."}
]
}'
A successful request returns a response similar to the following:
HTTP/1.1 200 OK
content-type: application/json
x-aisix-call-id: ***
x-aisix-served-by: gpt-4o-primary
server: AISIX/0.1.0
{
"id": "chatcmpl-***",
"object": "chat.completion",
"created": **********,
"model": "chat-prod",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today with AISIX routing?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 13,
"total_tokens": 26
}
}
The response body keeps the caller-facing model name. The x-aisix-served-by header shows which target model served the request.
Use this header when you need to confirm which target handled a routed request. It is not present on cache hits, single-target model responses, error responses, or every endpoint family.
Streaming requests can resolve multi-target aliases, but they do not fail over after a stream has started.
Tune Retry and Runtime Behavior
The example uses one retry on the primary target and one fallback to the secondary target.
Adjust the routing fields only when the traffic plan needs different behavior:
- Set
max_fallbacks: 0when you want target selection without cross-target fallback. - Enable
retry_on_429when upstream rate-limit responses should participate in retry and failover. - Configure
routing.on_all_filtered: "original_order"when attempting a target is preferable to returning503 all_candidates_unavailable.
For streaming requests, a timeout before the first upstream chunk can fail over to the next target. After streaming bytes have started, a timeout ends the stream instead of failing over mid-response.
Use GET /admin/v1/models/status to inspect runtime status for target models.
For complete model request fields and response shapes, see the Admin API Reference.
Next Steps
You have now configured a multi-target model and allowed a caller API key to use it. Continue with Multi-Target Model Failover to test failover end to end.