Ensemble Models

An ensemble model lets callers use one model alias while AISIX asks multiple panel models for candidate responses and then asks a judge model to synthesize the final answer. Use ensembles when the application should receive one answer that reflects several model attempts instead of a single routed target.

An ensemble has two parts:

Panel models produce candidate responses.
A judge model receives the successful panel responses and produces the final response returned to the caller.

Panel members and the judge must reference existing direct model aliases. Configure provider credentials, provider model names, health behavior, cooldown behavior, and model-level rate limits on those direct models.

Create the panel and judge models first, then create the ensemble alias:

{
  "display_name": "research-ensemble",
  "ensemble": {
    "panel": [
      { "model": "gpt-4o-panel" },
      { "model": "claude-panel" },
      { "model": "gemini-panel" }
    ],
    "judge": {
      "model": "gpt-4o-judge"
    },
    "min_responses": 2,
    "timeout_ms": 30000
  }
}

When min_responses is omitted, AISIX requires up to two successful panel responses, capped by the panel size. timeout_ms applies to each panel call and the judge call. Ensemble models are supported on chat-completions requests, including streaming requests.

Usage for an ensemble response includes the panel calls and the judge call when provider usage is available. Request-level limits apply to the caller-facing ensemble alias, and model-level limits apply to each referenced panel and judge model when AISIX calls them.

For the full model resource shape, see Model Aliases.