Metrics Reference
AISIX exposes Prometheus metrics on GET /metrics. By default, this endpoint is served on the admin listener. For managed gateways, metrics are served on a dedicated listener configured with observability.metrics.prometheus.addr.
The /metrics endpoint is unauthenticated by design. Keep the listener private to your monitoring network.
Metric families are registered lazily on first observation. Immediately after boot, /metrics can return an empty body. Send one request through the proxy, then scrape again for series to appear.
Request and Latency
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_requests_total | counter | provider, model, status, outcome | Total proxy requests. outcome is success, client_error, upstream_error, or rate_limited. |
aisix_request_duration_seconds | histogram | provider, model, status | End-to-end proxy request latency. |
aisix_llm_requests_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id, status, outcome | LLM-shaped requests through the proxy. |
aisix_llm_request_duration_seconds | histogram | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id, status, outcome | End-to-end latency for LLM requests. |
aisix_llm_api_latency_seconds | histogram | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Upstream API latency only, excluding gateway overhead. |
aisix_llm_time_to_first_token_seconds | histogram | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Time from request entry to first generated token chunk on streaming paths. |
Usage and Cost
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_tokens_consumed_total | counter | provider, model | Sum of usage.total_tokens across completed non-streaming calls. |
aisix_llm_input_tokens_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Input tokens reported by the upstream. |
aisix_llm_output_tokens_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Output tokens reported by the upstream. |
aisix_llm_total_tokens_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Total tokens reported by the upstream. |
aisix_llm_spend_micro_usd_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id | Estimated spend in micro-USD (1 USD = 1,000,000). |
Proxy Health
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_proxy_requests_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id, status, outcome | All proxy requests with full label granularity. |
aisix_proxy_failed_requests_total | counter | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id, status, outcome | Subset of aisix_proxy_requests_total where outcome is not success. |
aisix_proxy_request_duration_seconds | histogram | endpoint, inbound_protocol, provider, model, upstream_model, provider_key_id, api_key_id, team_id, user_id, status, outcome | End-to-end latency with full label granularity. |
aisix_proxy_in_flight_requests | gauge | endpoint, inbound_protocol | Currently active proxy requests. |
Rate Limits and Budgets
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_ratelimit_rejections_total | counter | scope | Rate-limit rejections by scope, such as requests or tokens. |
aisix_ratelimit_remaining_requests | gauge | api_key_id, model | Remaining request quota for the key/model pair. |
aisix_ratelimit_remaining_tokens | gauge | api_key_id, model | Remaining token quota for the key/model pair. |
aisix_budget_limit_usd | gauge | api_key_id, team_id, user_id | Budget limit in USD. |
aisix_budget_spent_usd | gauge | api_key_id, team_id, user_id | Budget spent in USD. |
aisix_budget_remaining_usd | gauge | api_key_id, team_id, user_id | Budget remaining in USD. |
aisix_budget_reset_seconds | gauge | api_key_id, team_id, user_id | Seconds until the budget period resets. |
aisix_budget_details_present | gauge | api_key_id, team_id, user_id | 1 when budget gauges are populated, 0 when cleared. |
Deployment and Routing
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_deployment_requests_total | counter | provider, model, upstream_model, provider_key_id | Total requests dispatched to a target model. |
aisix_deployment_success_responses_total | counter | provider, model, upstream_model, provider_key_id | Successful upstream responses from a target model. |
aisix_deployment_failure_responses_total | counter | provider, model, upstream_model, provider_key_id | Failed upstream responses from a target model. |
aisix_deployment_state | gauge | provider, model, upstream_model, provider_key_id | Runtime health state: 0 = healthy, 1 = partial failure, 2 = down. |
aisix_deployment_cooled_down_total | counter | provider, model, upstream_model, provider_key_id | Times a target model entered cooldown. |
aisix_routing_successful_fallbacks_total | counter | model | Successful failovers to the next routing candidate. |
aisix_routing_failed_fallbacks_total | counter | model | Failed failovers where no candidate was available. |
Guardrails
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_guardrail_blocks_total | counter | None | Requests rejected by a guardrail on input or output. |
aisix_guardrail_bypasses_total | counter | reason | Fail-open events where a remote guardrail was unreachable but fail_open allowed the request through. reason values include bedrock_5xx, bedrock_timeout, bedrock_throttled. |
Usage Events and Exporters
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_usage_events_emitted_total | counter | handler, status_code, inbound_protocol | Usage events successfully queued for delivery. status_code is bucketed as 2xx, 3xx, 4xx, 5xx, or other. handler is the endpoint name, such as chat, embeddings, or messages. |
aisix_usage_event_drops_total | counter | reason | Usage events dropped because the sink was full or closed. |
aisix_otlp_fanout_drops_total | counter | exporter, reason | OTLP trace spans dropped during fan-out. |
aisix_otlp_fanout_failures_total | counter | exporter | OTLP trace span delivery failures. |
Cache
| Metric | Type | Labels | Description |
|---|---|---|---|
aisix_redis_failures_total | counter | operation | Redis cache operation failures when the Redis backend is configured. |