Supported Endpoints
AISIX exposes proxy endpoints for the request formats application teams already use. Start here when you need to know which client API shape to call before choosing a provider, model alias, or traffic policy.
Main API Families
| API family | Routes | Use for |
|---|---|---|
| Chat completions | POST /v1/chat/completions | OpenAI-compatible chat requests, streaming chat, tool calling, routed models, and ensemble models. |
| Anthropic messages | POST /v1/messages, POST /v1/messages/count_tokens | Anthropic-style messages requests and token counting for Anthropic-backed models. |
| Responses | POST /v1/responses | OpenAI Responses API clients and agent-style response flows. |
| Text completions | POST /v1/completions | Legacy OpenAI-compatible text completion clients. |
| Embeddings | POST /v1/embeddings | Vector embeddings through supported providers. |
| Rerank | POST /v1/rerank | Document reranking through supported providers. |
| Image generation | POST /v1/images/generations | Text-to-image generation requests. |
| Speech and audio | POST /v1/audio/transcriptions, POST /v1/audio/translations, POST /v1/audio/speech | Speech-to-text, translation, and text-to-speech requests. |
| Provider passthrough | ANY /passthrough/:provider/*rest | Provider-specific calls that should use AISIX authentication and quota checks without AISIX normalizing the request body. |
Discovery and Health
| Endpoint | Use for |
|---|---|
GET /v1/models | Return the model aliases the caller API key can access. Use it when a client needs to discover gateway-facing model names. |
GET /livez | Check whether the proxy listener is alive. Use it for proxy listener health checks, not for model or provider readiness. |
Gateway Behavior
Modeled proxy routes share the same core gateway behavior: AISIX authenticates the caller API key, checks model access, resolves the requested model alias, applies configured controls, dispatches to the selected upstream provider, and records usage and telemetry when the route can be attributed to a model.
Some behavior is route-specific. For example, response caching applies to chat completions when a matching cache policy is configured, ensemble models are supported on chat completions, and token counting is limited to Anthropic-backed models. For provider and route constraints, see Provider Compatibility.
For exact request and response details, see the Proxy API Reference.