Production Deployment

Production deployment is the point where a working gateway becomes part of a real traffic path. Before widening traffic, confirm the deployment mode, dependency placement, listener exposure, readiness checks, and shutdown behavior.

Choose the Deployment Mode

Start by deciding where configuration is managed.

In a self-hosted deployment, AISIX runs with a local admin listener. You manage provider keys, model aliases, caller API keys, and policy resources through the Admin API, and AISIX stores those resources in etcd.

In a managed deployment, AISIX Cloud manages resources and projects them into the managed gateway. The local admin listener and local playground are not exposed. The gateway connects back to AISIX Cloud through the managed mTLS path.

Keep the rest of the deployment plan aligned with that choice. A self-hosted deployment needs a private admin path and reachable etcd configuration store. A managed deployment needs the Cloud certificate bundle, managed connectivity, and live proxy-path checks.

Prepare the Runtime Dependencies

Run the configuration store outside the gateway process. In self-hosted deployments, etcd is part of the gateway control plane, so keep it private to AISIX and the systems that manage gateway configuration.

Plan etcd persistence, backups, and access control before production traffic. Self-hosted etcd stores dynamic resources, including sensitive provider credentials and policy configuration.

Start with memory-backed cache policies unless multiple gateway instances need shared response caching. If cache policies should use Redis, configure cache.redis.url; policies that select Redis do not silently fall back to memory when Redis is unavailable.

Rate-limit counters use local process memory by default. For multi-instance deployments where request, token, or concurrency caps must apply across the whole gateway group, configure ratelimit.backend: redis and point every instance at the same Redis backend.

Use the startup config for process-level settings only. Dynamic resources such as provider keys, model aliases, caller API keys, guardrails, cache policies, and observability exporters should come from the configuration store.

Set the Listener Baseline

Expose the proxy listener only to intended callers or to the ingress tier that fronts caller traffic.

For self-hosted deployments, keep the admin listener on loopback, a private subnet, or an admin-only network. Do not expose admin or OpenAPI endpoints to the public network.

Metrics endpoints are unauthenticated and served on the dedicated metrics listener. Keep metrics endpoints private and scrape them only from trusted monitoring infrastructure.

Verify the dedicated metrics listener that your startup configuration exposes.

Enable listener TLS when proxy or admin traffic leaves local development. Use etcd TLS or mTLS when your configuration store requires encrypted or mutually authenticated transport. In managed deployments, verify the managed certificate bundle and Cloud control-plane URL before investigating higher-level projection issues.

Verify before Routing Traffic

Liveness is necessary, but it is not enough. A process can be alive while caller authentication, model resolution, provider credentials, or upstream network access is still broken.

Set the listener URLs for your deployment:

PROXY_URL="https://gateway.example.com"
ADMIN_URL="https://admin.internal.example.com"

Check proxy liveness on the proxy listener:

curl -i "${PROXY_URL}/livez"

For self-hosted deployments, also check admin liveness on the private admin listener:

curl -i "${ADMIN_URL}/livez"

In self-hosted deployments, check admin health for model health and configuration freshness:

curl -sS "${ADMIN_URL}/admin/v1/health" \
  -H "Authorization: Bearer ${AISIX_ADMIN_KEY}"

Then verify the actual request path:

GET /v1/models with the same caller API key the application will use.
One real proxy request for each endpoint family you plan to expose.
Logs, metrics, headers, usage events, or exporter output that show the smoke-test request.
A rejected request with an invalid or unauthorized caller API key, so access control is proven before traffic widens.

Shutdown Behavior

AISIX handles graceful shutdown on SIGINT and SIGTERM. During shutdown, the server stops accepting new work, coordinates listener shutdown with background tasks, and marks liveness as failing.

During an intentional drain, AISIX marks /livez as failed so load balancers or orchestration probes can remove the instance from service. GET /livez is a liveness signal, not a full readiness or traffic-drain contract.

Before Widening Traffic

Before exposing production traffic broadly, confirm these conditions:

The startup config matches the intended deployment mode.
The proxy listener is reachable only through the intended network path.
The admin listener is private in self-hosted deployments.
The configuration store is private and reachable from the gateway.
TLS, mTLS, and certificate file paths are valid for the transports that use them.
At least one provider key, model alias, and caller API key have been created.
The model alias appears through the caller-facing proxy path.
A real provider-backed request succeeds through the gateway.
Operational signals show the smoke-test request.

If the process is up but real requests fail, treat it as a request-path problem. Check model visibility, configuration propagation, caller API-key access, provider credentials, and upstream connectivity before widening traffic.

Next Steps

You have now seen the production baseline for moving AISIX toward real traffic. Continue with Network and Security for listener and secret handling, TLS and mTLS for transport security, and Health Checks for readiness checks.

Choose the Deployment Mode​

Prepare the Runtime Dependencies​

Set the Listener Baseline​

Verify before Routing Traffic​

Shutdown Behavior​

Before Widening Traffic​

Next Steps​