Troubleshooting
This guide helps you diagnose a running AISIX gateway that does not behave as expected. Work through the checks in order until the failing layer is clear, then use the matching section to decide the next action.
Fast Triage
Start with the runtime path before changing configuration.
Check listener health first:
curl -i "http://127.0.0.1:3000/livez"
In self-hosted mode, check the admin listener and configuration health with an admin key:
curl -i "http://127.0.0.1:3001/livez"
curl -sS "http://127.0.0.1:3001/admin/v1/health" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
Verify model discovery with the same caller API key the application uses:
curl -sS "http://127.0.0.1:3000/v1/models" \
-H "Authorization: Bearer ${AISIX_API_KEY}"
Then send one real request to the endpoint that failed. For example, check the OpenAI-compatible chat path with the caller API key:
curl -sS "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${AISIX_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Say hello."
}
]
}'
Use the response status, error type, and correlation header to choose the next check.
| Signal | What to Check Next |
|---|---|
| Listener health fails. | Process, listener binding, startup config, and TLS. |
| Admin health is stale or unavailable in self-hosted mode. | etcd reachability, snapshot freshness, and configuration watch health. |
| Model discovery does not show the expected alias. | Caller API key access, model type, and configuration visibility. |
| Real proxy request fails before upstream dispatch. | Caller authentication, model access, guardrails, rate limits, or budgets. |
| Real proxy request reaches the provider and fails. | Provider key, base URL, upstream model ID, quota, outage, or outbound network path. |
For request correlation, chat-completions responses use x-aisix-call-id. Many direct proxy endpoints, including Messages, Responses, rerank, audio, and passthrough, use x-aisix-request-id. For exact header scope, see Headers and Error Codes.
Verify Configuration Visibility
Use this step when resources were created or updated recently, or when the proxy behaves as if it has older configuration.
| Signal | Check | Action |
|---|---|---|
| Process fails during startup. | etcd endpoint, network reachability, TLS certificate paths, file permissions, and startup config syntax. | Fix startup config or etcd connectivity, then restart the gateway. |
| Admin writes succeed but proxy traffic uses old resources. | Admin health snapshot freshness in self-hosted mode. | Verify the final proxy path after the snapshot updates; see Configuration Propagation. |
| Model discovery omits a new alias. | Caller API key allow list, model type, and snapshot freshness. | Correct the model or caller API key, then query model discovery again. |
| Error mentions a missing provider key or unknown resource. | Provider key, model, and caller API key references. | Create or correct dependent resources in order, then send a real proxy request. |
In self-hosted deployments, etcd is part of the gateway control plane. If etcd is unavailable, dynamic resources such as provider keys, model aliases, caller API keys, policies, and exporters cannot load correctly.
Verify Caller Access and Policy
Use this step when AISIX rejects the request before a provider call, or when model discovery differs between caller API keys.
| Signal | Check | Action |
|---|---|---|
| Authentication error. | The application sends the plaintext caller API key, not the stored hash or upstream provider key. | Update the application secret or authorization header. |
| Permission or model-access error. | The requested model alias is allowed by the caller API key. | Add the alias to the caller API key or request an allowed model. |
| Content-policy error. | Enabled guardrails, the stage where each guardrail runs, and the triggering prompt or response content. | Adjust the prompt, guardrail rules, or fail-open behavior where appropriate. |
| Rate-limit or budget error. | Retry hint, API-key limit, model limit, shared policy, managed budget state, and replica-local counters. | Wait for the retry window, increase the limit, or adjust the matching policy. |
For exact proxy error envelopes, status codes, and retry headers, see Proxy Errors and Retries.
Verify the Provider Path
Use this step after AISIX authenticates the caller, resolves the model alias, and starts dispatching to the configured provider.
| Signal | Check | Action |
|---|---|---|
502 or upstream_error. | Provider key secret, base URL, upstream model ID, provider quota, provider outage, and outbound network path. | Fix the provider path, then send the same proxy request again. |
503 with provider unavailable. | Provider adapter availability and whether the resolved adapter supports the requested route. | Use a supported provider, adapter, endpoint, or model. |
503 with all candidates unavailable. | Multi-target model health, cooldown state, and routing filters. | Restore a healthy target or adjust routing behavior. |
| Model health is degraded or down. | Recent upstream failure streak, provider outage, quota, outbound network path, and credential validity. | Restore provider reachability or route traffic to a healthy target. |
Compare the failing route with the matching provider upstream guide when the problem is provider-specific.
Verify Managed Gateway Projection
Use this step when AISIX Cloud state and live gateway behavior do not match, or when a managed gateway cannot receive projected configuration.
| Signal | Check | Action |
|---|---|---|
| Managed heartbeat fails. | Certificate identity, trust roots, runtime state, control-plane URL, and outbound network path. | Restore managed connectivity before investigating resource projection; see Gateway Certificates and Managed Gateway. |
| Cloud shows a resource but live traffic does not use it. | Environment scope and projection status for the gateway handling traffic. | Move the resource to the correct environment or wait for projection to complete. |
| Managed budget checks fail or appear unavailable. | Managed connectivity, budget policy target, and budget-check response details. | Restore budget-check connectivity or correct the managed policy. |
| Cloud playground succeeds but live traffic differs. | Live gateway, environment, model alias, caller key, and provider target. | Send the live request through the intended managed gateway and environment. |
Next Steps
You have now seen how to narrow failures by layer. Return to Production Deployment when you need to recheck the production baseline, or use the relevant feature guide for the failing layer.