Models
In this guide, you will create the model aliases callers send to AISIX and learn how single-target, multi-target, and ensemble models fit into the request path.
A model connects the caller-facing model name to the upstream target AISIX should call. A single-target model points to one upstream model through one provider key. A multi-target model points to multiple target model aliases and lets AISIX choose one at request time. An ensemble model fans a chat request out to panel models and asks a judge model to synthesize one answer.
Start with a single-target model. Create a multi-target model when callers need one stable name in front of failover, round-robin, or weighted target selection. Create an ensemble model when the caller should receive one synthesized answer based on several model responses.
Prerequisites
Before starting, prepare the following:
- A self-hosted gateway with the admin listener available.
- The admin key from the gateway
config.yaml. - A provider key ID for the upstream credential. If you do not have one yet, configure Provider Credentials first.
Create a Single-Target Model
A single-target model maps one AISIX model alias to one upstream model.
Create a single-target model with the provider key ID you prepared:
export AISIX_ADMIN_KEY="YOUR_ADMIN_KEY"
export PROVIDER_KEY_ID="YOUR_PROVIDER_KEY_ID"
curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "gpt-4o-prod",
"provider": "openai",
"model_name": "gpt-4o",
"provider_key_id": "'"${PROVIDER_KEY_ID}"'"
}'
You should see a response similar to the following:
{
"id": "677c847f-d92d-4f0e-b445-8b449764f06a",
"value": {
"display_name": "gpt-4o-prod",
"provider": "openai",
"model_name": "gpt-4o",
"provider_key_id": "YOUR_PROVIDER_KEY_ID"
},
"revision": 1
}
Copy the highlighted id if you plan to update, inspect, or delete this model later.
The display_name is the name callers send in model. The model_name is the upstream model ID or deployment name AISIX sends to the provider. These values can be the same, but they do not have to be.
Create a Multi-Target Model
A multi-target model is useful when one caller-facing alias should select from multiple target models. It uses a routing block instead of storing its own provider key or upstream model name.
Create the target models first. Then create the multi-target model and reference those target aliases by display_name:
curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "chat-prod",
"routing": {
"strategy": "failover",
"targets": [
{"model": "gpt-4o-primary"},
{"model": "gpt-4o-secondary"}
],
"retries": 1,
"max_fallbacks": 1,
"retry_on_429": true
}
}'
You should see a response similar to the following:
{
"id": "134c4b01-09e6-41b7-97c7-f4e9a608f4c2",
"value": {
"display_name": "chat-prod",
"routing": {
"strategy": "failover",
"targets": [
{
"model": "gpt-4o-primary"
},
{
"model": "gpt-4o-secondary"
}
],
"retries": 1,
"max_fallbacks": 1,
"retry_on_429": true
}
},
"revision": 1
}
Copy the highlighted id if you plan to update, inspect, or delete this multi-target model later.
Choose the strategy based on how the gateway should select the first target:
| Strategy | Use When |
|---|---|
failover | One target is primary and later targets are backups. |
round_robin | Similar targets should receive simple request distribution. |
weighted | Targets should receive unequal traffic shares. |
Fallback still follows the target order after the first target is selected. For the detailed routing behavior, see Routing and Failover.
Create an Ensemble Model
An ensemble model uses other direct model aliases as panel members and a judge. Unlike a multi-target model, it does not choose one target for the request. It calls the panel members, keeps enough successful responses to satisfy min_responses, and calls the judge model to synthesize the final answer.
Create the direct panel and judge models first. Then create the ensemble model and reference those aliases by display_name:
curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "research-ensemble",
"ensemble": {
"panel": [
{"model": "gpt-4o-primary", "temperature": 0.2},
{"model": "claude-sonnet-primary", "temperature": 0.4}
],
"judge": {
"model": "gpt-4o-judge"
},
"min_responses": 2,
"timeout_ms": 45000
}
}'
The panel entries reference direct model aliases. Optional temperature and seed values override the caller's request for that panel call. The judge.model also references a direct model alias and can include synthesis_prompt when the default synthesis prompt is not appropriate.
When min_responses is omitted, AISIX requires up to two successful panel responses, capped by the panel size. timeout_ms applies to each panel call and the judge call. Ensemble models are supported on chat-completions requests, including streaming requests.
Configure Optional Model Behavior
Most models only need a caller-facing alias, provider label, upstream model name, and provider key ID. Add optional fields only when the behavior is part of your traffic plan.
Common optional fields include:
timeout, when a provider request should have a stricter per-request timeout.stream_timeout, when a streaming request should have a separate per-chunk read timeout.allowed_cidrs, when only callers from specific client IP ranges should use the model alias.background_model_check, when AISIX should probe a single-target model outside the request path and mark it unhealthy after failed probes.cooldown, when real request failures should temporarily exclude a single-target model from routing.rate_limit, when the limit should follow one model alias. For details, see Rate Limits.
Multi-target models use the selected target model's provider settings, timeout, health, and cooldown behavior. Ensemble models use the panel and judge model settings when those direct models are called. Configure provider settings, health, and cooldown behavior on the referenced direct models, not on the multi-target or ensemble alias.
allowed_cidrs applies to the model alias the caller names. AISIX resolves the client IP from the immediate peer unless proxy.real_ip is configured to trust forwarded headers from your load balancer or ingress.
Cost Metadata
Use cost when usage reporting or budget checks need pricing metadata for the model alias. The field records input and output cost in USD per 1,000 tokens; it does not affect provider routing or access control.
Add the cost object when you create or update the model alias that represents the priced upstream model:
{
"display_name": "gpt-4o-prod",
"provider": "openai",
"model_name": "gpt-4o",
"provider_key_id": "PROVIDER_KEY_ID",
"cost": {
"input_per_1k": 0.0025,
"output_per_1k": 0.01
}
}
Verify Caller Visibility
After creating or changing a model, verify both the admin resource and the caller-visible proxy behavior.
List models through the proxy with a caller API key that is allowed to use the alias:
export AISIX_API_KEY="YOUR_CALLER_API_KEY"
curl -sS "http://127.0.0.1:3000/v1/models" \
-H "Authorization: Bearer ${AISIX_API_KEY}"
GET /v1/models lists single-target, ensemble, and semantic-router model aliases that the caller API key is allowed to access. Only multi-target (routing) aliases are intentionally hidden from this discovery response, even though callers can target them directly if they know the alias.
Next Steps
You have now configured the caller-facing model alias. Continue with Caller API Keys to allow callers to use the alias.
For complete model request fields, response shapes, and status routes, see the Admin API Reference.