Deploy for High Availability
Deploying API7 Gateway for high availability (HA) eliminates single points of failure in both the control plane (CP) and data plane (DP). This guide covers the prerequisites, architecture, and step-by-step instructions for deploying a production-grade HA setup.
For a conceptual overview, see High Availability.
This page explains how to deploy an HA topology. After the deployment is in place, use the configure-and-manage guides for ongoing operations:
- Scale Data Plane
- Autoscale Data Plane on Kubernetes
- Data Plane High Availability
- Data Plane Resilience
Before you deploy, plan the surrounding infrastructure that keeps the cluster available during failures:
- An external PostgreSQL deployment with its own HA or managed failover strategy.
- A load balancer in front of the control plane and another in front of the data plane.
- TLS certificates for the Dashboard and DP Manager endpoints, plus mTLS certificates for data plane nodes.
- A maintenance process that restarts one node at a time and verifies traffic before moving on.
Architecture Overview
A high-availability deployment consists of:
- Multiple CP nodes (Dashboard + DP Manager) sharing a PostgreSQL database, fronted by a load balancer.
- Multiple DP nodes in one or more Gateway Groups, fronted by a load balancer.
- (Optional) A backup gateway node that periodically exports configuration to external storage (AWS S3 or Azure Blob Storage) for data plane resilience during CP outages.
Prerequisites
Deployment Checklist
Complete the following preparation work before you install or scale any HA nodes:
- Provision an external PostgreSQL instance or cluster and confirm that automatic failover is handled outside API7 Gateway.
- Reserve stable DNS names or virtual IPs for the control plane and data plane load balancers.
- Prepare TLS certificates for the Dashboard and DP Manager endpoints. If data plane nodes connect over mTLS, also prepare the CA and node certificates required by your deployment flow.
- Confirm that every control plane node uses the same PostgreSQL DSN and that every data plane node connects to the same gateway group through the same DP Manager endpoint.
API7 Gateway depends on the availability of PostgreSQL, but it does not configure PostgreSQL replication, promotion, or backups for you. Treat database HA as a prerequisite for control plane HA.
Minimum Hosts
| Component | Minimum Nodes | Notes |
|---|---|---|
| Control Plane | 2 | Each runs Dashboard + DP Manager |
| Data Plane | 2 | Each runs API7 Gateway |
| PostgreSQL | Managed or HA cluster | HA configuration is out of scope; see PostgreSQL HA |
Hardware Requirements
| Component | CPU | Memory | Disk |
|---|---|---|---|
| Control Plane | 4 Cores | 8 GB | 40 GB |
| Data Plane | 4 Cores | 8 GB | 20 GB |
For detailed requirements, see System Requirements.
Network Ports
Ensure the following ports are accessible between components:
| Service | Port | Protocol | Description |
|---|---|---|---|
| Dashboard | 7080 / 7443 | HTTP / HTTPS | Dashboard UI and Admin API |
| DP Manager | 7900 / 7943 | HTTP / HTTPS | Data plane management |
| Gateway (HTTP) | 9080 | HTTP | API traffic |
| Gateway (HTTPS) | 9443 | HTTPS | API traffic |
| Gateway Status | 7085 | HTTP | Health check endpoint |
| PostgreSQL | 5432 | TCP | Database |
| Prometheus | 9090 | HTTP | Metrics (optional) |
Load Balancers and TLS
For production HA, give each traffic path a stable frontend:
| Endpoint | Typical frontend | Purpose |
|---|---|---|
| Dashboard | Ingress, internal load balancer, or reverse proxy | Operator access to the Dashboard and Admin API |
| DP Manager | Cluster-internal Service or internal load balancer | Stable mTLS address for data plane nodes |
| Data plane | External load balancer or ingress controller | Client API traffic |
On Kubernetes, the control plane chart already creates separate Services for the Dashboard and DP Manager. A common pattern is to keep dashboard_service.type: ClusterIP and expose the Dashboard through an Ingress or internal proxy, while keeping dp_manager_service as a stable internal endpoint. If data plane nodes connect from outside the cluster, expose the DP Manager through an internal load balancer or another stable private frontend.
Control Plane HA
The API7 Dashboard and DP Manager are stateless applications that store all configuration in PostgreSQL. Deploy multiple instances behind a load balancer for HA.
All control plane replicas must share:
- The same PostgreSQL database.
- The same license state and Dashboard configuration.
- Stable endpoints for operators and data plane nodes.
- Kubernetes (Helm)
- Docker
Scale the control plane by setting replica counts in your Helm values and keeping the service frontends stable:
dashboard:
replicaCount: 2
dp_manager:
replicaCount: 2
postgresql:
builtin: false
dashboard_service:
type: ClusterIP
dp_manager_service:
type: ClusterIP
dashboard_configuration:
database:
dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"
dp_manager_configuration:
database:
dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"
❶ Deploy at least 2 Dashboard replicas for redundancy.
❷ Deploy at least 2 DP Manager replicas.
❸ Disable the built-in PostgreSQL and point the control plane to your external HA database.
❹ Keep the Dashboard behind a stable frontend such as an Ingress, internal proxy, or LoadBalancer service.
❺ Keep the DP Manager reachable at one stable address for all data plane nodes in the same gateway group.
If Developer Portal remains enabled in the same Helm release, also set developer_portal_configuration.database.dsn to the same PostgreSQL endpoint or disable Developer Portal for that release.
Install or upgrade the Helm release:
helm upgrade --install api7ee3 api7/api7ee3 -f values.yaml -n api7 --create-namespace
If you terminate TLS on the Dashboard itself, configure the Dashboard certificate with dashboard.keyCertSecret. If you terminate TLS at an Ingress or load balancer instead, keep the backend ports reachable only on the trusted network.
For stronger placement guarantees on Kubernetes, also consider dashboard.topologySpreadConstraints and dp_manager.topologySpreadConstraints so replicas land on different worker nodes or availability zones.
On each CP host, run both the Dashboard and DP Manager containers.
Dashboard (dashboard-config.yaml):
server:
listen:
host: "0.0.0.0"
port: 7080
database:
dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"
docker run -d --name api7-dashboard \
-p 7080:7080 \
-v $(pwd)/dashboard-config.yaml:/app/conf/config.yaml \
api7/api7-ee-3-integrated:latest
DP Manager (dp-manager-config.yaml):
server:
listen:
host: "0.0.0.0"
port: 7900
database:
dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"
docker run -d --name api7-dp-manager \
-p 7900:7900 \
-v $(pwd)/dp-manager-config.yaml:/app/conf/config.yaml \
api7/api7-ee-dp-manager:latest
Deploy a load balancer (for example, NGINX or HAProxy) in front of the CP hosts to distribute traffic across Dashboard and DP Manager instances.
Configure the load balancer with passive or active health checks and ensure that the backend pool includes every CP node before you point users or data plane nodes to it.
All Dashboard and DP Manager instances must connect to the same PostgreSQL database. PostgreSQL HA (primary-replica, Patroni, or managed services like Amazon RDS, Azure Database, or Google Cloud SQL) is a separate concern and should be configured according to your database provider's documentation.
Data Plane HA
Data plane nodes are stateless — they receive configuration from the control plane and process traffic independently. Deploy multiple nodes behind a load balancer.
For predictable failover behavior, configure the data plane so that:
- Each node belongs to the same gateway group.
- The load balancer only sends traffic to healthy nodes.
- Nodes are distributed across failure domains where possible, such as different hosts, zones, or Kubernetes worker nodes.
Health Check Configuration
Each gateway node exposes a status endpoint for health monitoring:
| Endpoint | Method | Description |
|---|---|---|
/status | GET | Returns 200 if the gateway is running |
/status/ready | GET | Returns 200 if the gateway is ready to accept traffic |
The status endpoint listens on port 7085 by default. Configure your load balancer to perform health checks against this endpoint at regular intervals (for example, every 10–30 seconds).
Use /status when you only need to know that the gateway process is alive. Use /status/ready when you want the load balancer to route traffic only to nodes that still have an available DP Manager connection.
If your operating model allows data plane nodes to continue serving cached configuration during a brief control plane interruption, health checking /status keeps those nodes in rotation. If you instead want the load balancer to stop routing traffic when DP Manager connectivity is lost, use /status/ready. See Configure Readiness and Liveness Probes for the detailed tradeoff.
Deploy Multiple DP Nodes
- Kubernetes (Helm)
- Docker
Add these HA-oriented settings to the values.yaml file you use for the api7/gateway chart:
api7ee:
status_endpoint:
enabled: true
ip: 0.0.0.0
port: 7085
apisix:
replicaCount: 3
podDisruptionBudget:
enabled: true
minAvailable: 1
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- gateway
topologyKey: kubernetes.io/hostname
gateway:
readinessProbe:
httpGet:
path: /status/ready
port: 7085
livenessProbe:
httpGet:
path: /status
port: 7085
❶ Expose the status endpoints on port 7085 so Kubernetes and external health checkers can verify node state.
❷ Run at least 2 data plane replicas for HA; start with 3 if you want more headroom during maintenance.
❸ Use a PodDisruptionBudget so voluntary disruptions do not evict all gateway pods at once.
❹ Spread pods across worker nodes to reduce the impact of a single-node failure.
❺ Use separate readiness and liveness checks so the scheduler and load balancers stop sending traffic to nodes that are not ready.
Choose the service exposure model that matches your environment. On managed Kubernetes, setting gateway.type: LoadBalancer is a common way to provision the data plane frontend. If you already have an ingress controller or external L4/L7 load balancer, keep the service type that fits that design and point the frontend at the gateway Service.
Run gateway containers on multiple hosts. Each connects to the DP Manager for configuration:
docker run -d --name api7-gateway \
-p 9080:9080 \
-p 9443:9443 \
-v $(pwd)/gateway-config.yaml:/usr/local/apisix/conf/config.yaml \
api7/api7-ee-3-gateway:latest
Deploy a load balancer in front of all gateway hosts. Configure health checks to poll http://<host>:7085/status or http://<host>:7085/status/ready and remove unhealthy nodes from the pool.
For details on health checks and resilience behavior, see Data Plane High Availability.
Data Plane Resilience (Fallback CP)
For environments that require the data plane to survive extended control plane outages, configure Fallback CP. This feature periodically exports all gateway configuration to external storage (AWS S3 or Azure Blob Storage), enabling data plane nodes to fetch configuration from storage when the control plane is unreachable.
For detailed setup instructions, see Data Plane Resilience.
Verification
After deploying the HA setup, verify each component:
-
Control Plane: Access the Dashboard through the load balancer URL. Confirm that both instances are serving requests by checking access logs on each node.
-
Data Plane: Send a test request through the DP load balancer:
curl -i "http://<dp-load-balancer>:9080/" -
Health Checks: Verify the status endpoint on each DP node:
curl -i "http://<dp-node>:7085/status"
# Expected: HTTP/1.1 200 OK -
Failover Test: Stop one CP node and verify the Dashboard remains accessible through the load balancer. Stop one DP node and verify API traffic continues to flow through the remaining nodes.
Operational Runbook
Use the following checks during acceptance testing and routine maintenance.
Control Plane Failover Test
- Log in to the Dashboard through the control plane load balancer.
- Stop or isolate one CP node.
- Refresh the Dashboard and confirm that the session remains usable.
- Apply a small configuration change, such as updating a route description, and confirm it succeeds.
- Restore the failed node and confirm it rejoins the load balancer pool.
Data Plane Failover Test
- Send repeated requests through the data plane load balancer.
- Stop or drain one gateway node.
- Confirm the load balancer health checks mark the node unhealthy.
- Verify client traffic continues through the remaining gateway nodes without a full outage.
- Restore the node and confirm it returns to service only after the health check passes.
Rolling Restart Procedure
Perform rolling maintenance one node at a time:
- Confirm at least one other CP node and one other DP node are healthy before you restart anything.
- Remove a single node from the load balancer pool or drain it gracefully.
- Restart that node and wait until it is healthy again.
- Verify Dashboard access or gateway traffic before moving to the next node.
- Repeat for the remaining nodes.
Never restart all control plane nodes or all data plane nodes at the same time unless you have planned downtime.
Next Steps
- Scale Data Plane: Adjust the number of data plane nodes based on traffic.
- Autoscale Data Plane on Kubernetes: Use HPA to scale Kubernetes deployments automatically.
- Data Plane Resilience: Configure fallback storage for extended CP outages.
- Production Best Practices: Harden your deployment for production workloads.
- System Requirements: Review hardware and software requirements.