Multi-Target Model Failover

This tutorial creates a multi-target model that fails over from a primary model to a secondary model. Applications call one model name while AISIX chooses the upstream model for each request.

The multi-target model is named chat-prod. It points to two target models, and each target model uses its own provider key. After the first request succeeds, you point the primary provider key to an unreachable host and confirm that AISIX serves the next request from the secondary target.

Prerequisites

Before starting, prepare the following:

A completed Self-Hosted Quickstart.
jq to capture resource IDs in the runnable setup.
OpenAI API keys for the primary and secondary upstreams. You can use the same OpenAI API key for both because the failover check points the primary provider key to an unreachable host.

Set Variables

Export the values used in the commands:

export AISIX_ADMIN_KEY="YOUR_ADMIN_KEY"
export PRIMARY_OPENAI_API_KEY="YOUR_PRIMARY_PROVIDER_KEY"
export SECONDARY_OPENAI_API_KEY="YOUR_SECONDARY_PROVIDER_KEY"
export CALLER_KEY="sk-failover-demo"

If you use one OpenAI account for both upstreams, set PRIMARY_OPENAI_API_KEY and SECONDARY_OPENAI_API_KEY to the same value.

Create the SHA-256 hash that AISIX stores for the caller API key:

CALLER_KEY_HASH=$(printf '%s' "${CALLER_KEY}" | shasum -a 256 | awk '{print $1}')

Configure the Multi-Target Model

To build chat-prod, create the provider keys, target models, multi-target model, and caller API key in order.

Create Provider Keys

Create the primary provider key:

PRIMARY_PK_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/provider_keys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "openai-primary",
    "provider": "openai",
    "adapter": "openai",
    "secret": "'"${PRIMARY_OPENAI_API_KEY}"'",
    "api_base": "https://api.openai.com/v1"
  }' | jq -r .id)

Create the secondary provider key:

SECONDARY_PK_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/provider_keys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "openai-secondary",
    "provider": "openai",
    "adapter": "openai",
    "secret": "'"${SECONDARY_OPENAI_API_KEY}"'",
    "api_base": "https://api.openai.com/v1"
  }' | jq -r .id)

Verify that both IDs were captured:

printf 'primary provider key: %s\nsecondary provider key: %s\n' \
  "${PRIMARY_PK_ID}" "${SECONDARY_PK_ID}"

You should see two non-empty IDs:

primary provider key: 7fd2d8ce-f79d-49cc-b742-d32fda7b7d5a
secondary provider key: 04573e6c-6319-477e-b2a4-a67a1911c727

The IDs are already stored in PRIMARY_PK_ID and SECONDARY_PK_ID for later commands.

Create Target Models

Create the primary model:

PRIMARY_MODEL_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "gpt-4o-primary",
    "provider": "openai",
    "model_name": "gpt-4o-mini",
    "provider_key_id": "'"${PRIMARY_PK_ID}"'"
  }' | jq -r .id)

Create the secondary model:

SECONDARY_MODEL_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "gpt-4o-secondary",
    "provider": "openai",
    "model_name": "gpt-4o-mini",
    "provider_key_id": "'"${SECONDARY_PK_ID}"'"
  }' | jq -r .id)

Create a Multi-Target Model

Create a multi-target model named chat-prod. The proxy starts with gpt-4o-primary, retries it once on a retryable failure, and then fails over to gpt-4o-secondary.

CHAT_PROD_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "chat-prod",
    "routing": {
      "strategy": "failover",
      "targets": [
        {"model": "gpt-4o-primary"},
        {"model": "gpt-4o-secondary"}
      ],
      "retries": 1,
      "max_fallbacks": 1
    }
  }' | jq -r .id)

Create a Caller API Key

Create a caller API key that can call the multi-target model:

APIKEY_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "key_hash": "'"${CALLER_KEY_HASH}"'",
    "allowed_models": ["chat-prod"]
  }' | jq -r .id)

Verify that the model and caller API key IDs were captured:

printf 'primary model: %s\nsecondary model: %s\nmulti-target model: %s\ncaller API key: %s\n' \
  "${PRIMARY_MODEL_ID}" "${SECONDARY_MODEL_ID}" "${CHAT_PROD_ID}" "${APIKEY_ID}"

You should see one ID for each created resource:

primary model: 3b909841-c0a7-4ad8-8f1b-a9df8f10b581
secondary model: 4f49a654-b03d-4f19-a40c-9a6dc3210a55
multi-target model: 247d7dc4-e943-42f8-a841-6a3758e6d34d
caller API key: 7d3b710e-9f47-4ab8-90a3-028b5572f686

The IDs are already stored in PRIMARY_MODEL_ID, SECONDARY_MODEL_ID, CHAT_PROD_ID, and APIKEY_ID for later verification and cleanup commands.

If any value is empty or null, check the previous command output for an error_msg before continuing.

Verify Failover Behavior

Send one request through the normal path, then make the primary upstream unreachable and confirm that AISIX serves the next request from the secondary model.

Verify the Multi-Target Model

Send a request to chat-prod:

curl -sS -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${CALLER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chat-prod",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'

A successful request returns an OpenAI-compatible chat-completions response similar to the following:

{
  "id": "chatcmpl-***",
  "object": "chat.completion",
  "created": **********,
  "model": "chat-prod",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}

Trigger Failover

Update the primary provider key to point to an unreachable host:

curl -sS -X PUT "http://127.0.0.1:3001/admin/v1/provider_keys/${PRIMARY_PK_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "openai-primary",
    "provider": "openai",
    "adapter": "openai",
    "secret": "'"${PRIMARY_OPENAI_API_KEY}"'",
    "api_base": "https://api.openai.invalid/v1"
  }'

Send the request again and include response headers:

curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
  -H "Authorization: Bearer ${CALLER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chat-prod",
    "messages": [
      {"role": "user", "content": "Say hello."}
    ]
  }'

A successful failover response starts with HTTP/1.1 200 OK and includes x-aisix-served-by: gpt-4o-secondary. The response body is similar to the following:

{
  "id": "chatcmpl-***",
  "object": "chat.completion",
  "created": **********,
  "model": "chat-prod",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 9,
    "total_tokens": 19
  }
}

The request still succeeds because AISIX retries the primary target and then forwards the request to the secondary target.

Verify Runtime Status

Check the runtime status of the target models:

curl -sS "http://127.0.0.1:3001/admin/v1/models/status" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"

A successful status response includes the target models and the multi-target model:

[
  {
    "id": "3b909841-c0a7-4ad8-8f1b-a9df8f10b581",
    "display_name": "gpt-4o-primary",
    "kind": "direct",
    "status": "cooldown",
    "cooldown_until": {
      "secs_since_epoch": **********,
      "nanos_since_epoch": 0
    },
    "status_reason": "transport_error"
  },
  {
    "id": "4f49a654-b03d-4f19-a40c-9a6dc3210a55",
    "display_name": "gpt-4o-secondary",
    "kind": "direct",
    "status": "healthy"
  },
  {
    "id": "247d7dc4-e943-42f8-a841-6a3758e6d34d",
    "display_name": "chat-prod",
    "kind": "routing",
    "status": "not_applicable"
  }
]

The primary target is in cooldown after the retryable failure. The multi-target model reports not_applicable because runtime status is tracked on the target models that call providers.

Clean Up

Clean up these tutorial resources if you do not plan to keep testing failover.

Restore the primary provider key before deleting resources:

curl -sS -X PUT "http://127.0.0.1:3001/admin/v1/provider_keys/${PRIMARY_PK_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "openai-primary",
    "provider": "openai",
    "adapter": "openai",
    "secret": "'"${PRIMARY_OPENAI_API_KEY}"'",
    "api_base": "https://api.openai.com/v1"
  }'

Delete the resources:

curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/models/${CHAT_PROD_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/models/${PRIMARY_MODEL_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/models/${SECONDARY_MODEL_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/apikeys/${APIKEY_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/provider_keys/${PRIMARY_PK_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
curl -sS -X DELETE "http://127.0.0.1:3001/admin/v1/provider_keys/${SECONDARY_PK_ID}" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"

Next Steps

You have now created a multi-target model and verified failover from one upstream target to another. For routing strategies, retries, and runtime filtering, see Routing and Failover.

Prerequisites​

Set Variables​

Configure the Multi-Target Model​

Create Provider Keys​

Create Target Models​

Create a Multi-Target Model​

Create a Caller API Key​

Verify Failover Behavior​

Verify the Multi-Target Model​

Trigger Failover​

Verify Runtime Status​

Clean Up​

Next Steps​