Skip to main content

Ensemble Models

An ensemble model lets callers use one model alias while AISIX asks multiple panel models for candidate responses and then asks a judge model to synthesize the final answer. Use ensembles when the application should receive one answer that reflects several model attempts instead of a single routed target.

An ensemble has two parts:

  • Panel models produce candidate responses.
  • A judge model receives the successful panel responses and produces the final response returned to the caller.

Panel members and the judge must reference existing direct model aliases. Configure provider credentials, provider model names, health behavior, cooldown behavior, and model-level rate limits on those direct models.

Create the panel and judge models first, then create the ensemble alias:

{
"display_name": "research-ensemble",
"ensemble": {
"panel": [
{ "model": "gpt-4o-panel" },
{ "model": "claude-panel" },
{ "model": "gemini-panel" }
],
"judge": {
"model": "gpt-4o-judge"
},
"min_responses": 2,
"timeout_ms": 30000
}
}

When min_responses is omitted, AISIX requires up to two successful panel responses, capped by the panel size. timeout_ms applies to each panel call and the judge call. Ensemble models are supported on chat-completions requests, including streaming requests.

Usage for an ensemble response includes the panel calls and the judge call when provider usage is available. Request-level limits apply to the caller-facing ensemble alias, and model-level limits apply to each referenced panel and judge model when AISIX calls them.

For the full model resource shape, see Model Aliases.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation