Version: 3.10.x

Deploy for High Availability

Deploying API7 Gateway for high availability (HA) eliminates single points of failure in both the control plane (CP) and data plane (DP). This guide covers the prerequisites, architecture, and step-by-step instructions for deploying a production-grade HA setup.

For a conceptual overview, see High Availability.

This page explains how to deploy an HA topology. After the deployment is in place, use the configure-and-manage guides for ongoing operations:

Before you deploy, plan the surrounding infrastructure that keeps the cluster available during failures:

An external PostgreSQL deployment with its own HA or managed failover strategy.
A load balancer in front of the control plane and another in front of the data plane.
TLS certificates for the Dashboard and DP Manager endpoints, plus mTLS certificates for data plane nodes.
A maintenance process that restarts one node at a time and verifies traffic before moving on.

Architecture Overview

A high-availability deployment consists of:

Multiple CP nodes (Dashboard + DP Manager) sharing a PostgreSQL database, fronted by a load balancer.
Multiple DP nodes in one or more Gateway Groups, fronted by a load balancer.
(Optional) A backup gateway node that periodically exports configuration to external storage (AWS S3 or Azure Blob Storage) for data plane resilience during CP outages.

Prerequisites

Deployment Checklist

Complete the following preparation work before you install or scale any HA nodes:

Provision an external PostgreSQL instance or cluster and confirm that automatic failover is handled outside API7 Gateway.
Reserve stable DNS names or virtual IPs for the control plane and data plane load balancers.
Prepare TLS certificates for the Dashboard and DP Manager endpoints. If data plane nodes connect over mTLS, also prepare the CA and node certificates required by your deployment flow.
Confirm that every control plane node uses the same PostgreSQL DSN and that every data plane node connects to the same gateway group through the same DP Manager endpoint.

note

API7 Gateway depends on the availability of PostgreSQL, but it does not configure PostgreSQL replication, promotion, or backups for you. Treat database HA as a prerequisite for control plane HA.

Minimum Hosts

Component	Minimum Nodes	Notes
Control Plane	2	Each runs Dashboard + DP Manager
Data Plane	2	Each runs API7 Gateway
PostgreSQL	Managed or HA cluster	HA configuration is out of scope; see PostgreSQL HA

Hardware Requirements

Component	CPU	Memory	Disk
Control Plane	4 Cores	8 GB	40 GB
Data Plane	4 Cores	8 GB	20 GB

For detailed requirements, see System Requirements.

Network Ports

Ensure the following ports are accessible between components:

Service	Port	Protocol	Description
Dashboard	7080 / 7443	HTTP / HTTPS	Dashboard UI and Admin API
DP Manager	7900 / 7943	HTTP / HTTPS	Data plane management
Gateway (HTTP)	9080	HTTP	API traffic
Gateway (HTTPS)	9443	HTTPS	API traffic
Gateway Status	7085	HTTP	Health check endpoint
PostgreSQL	5432	TCP	Database
Prometheus	9090	HTTP	Metrics (optional)

Load Balancers and TLS

For production HA, give each traffic path a stable frontend:

Endpoint	Typical frontend	Purpose
Dashboard	Ingress, internal load balancer, or reverse proxy	Operator access to the Dashboard and Admin API
DP Manager	Cluster-internal Service or internal load balancer	Stable mTLS address for data plane nodes
Data plane	External load balancer or ingress controller	Client API traffic

On Kubernetes, the control plane chart already creates separate Services for the Dashboard and DP Manager. A common pattern is to keep dashboard_service.type: ClusterIP and expose the Dashboard through an Ingress or internal proxy, while keeping dp_manager_service as a stable internal endpoint. If data plane nodes connect from outside the cluster, expose the DP Manager through an internal load balancer or another stable private frontend.

Control Plane HA

The API7 Dashboard and DP Manager are stateless applications that store all configuration in PostgreSQL. Deploy multiple instances behind a load balancer for HA.

All control plane replicas must share:

The same PostgreSQL database.
The same license state and Dashboard configuration.
Stable endpoints for operators and data plane nodes.

Kubernetes (Helm)
Docker

Scale the control plane by setting replica counts in your Helm values and keeping the service frontends stable:

values.yaml
dashboard:
  replicaCount: 2
dp_manager:
  replicaCount: 2

postgresql:
  builtin: false

dashboard_service:
  type: ClusterIP

dp_manager_service:
  type: ClusterIP

dashboard_configuration:
  database:
    dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"

dp_manager_configuration:
  database:
    dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"

❶ Deploy at least 2 Dashboard replicas for redundancy.

❷ Deploy at least 2 DP Manager replicas.

❸ Disable the built-in PostgreSQL and point the control plane to your external HA database.

❹ Keep the Dashboard behind a stable frontend such as an Ingress, internal proxy, or LoadBalancer service.

❺ Keep the DP Manager reachable at one stable address for all data plane nodes in the same gateway group.

If Developer Portal remains enabled in the same Helm release, also set developer_portal_configuration.database.dsn to the same PostgreSQL endpoint or disable Developer Portal for that release.

Install or upgrade the Helm release:

helm upgrade --install api7ee3 api7/api7ee3 -f values.yaml -n api7 --create-namespace

If you terminate TLS on the Dashboard itself, configure the Dashboard certificate with dashboard.keyCertSecret. If you terminate TLS at an Ingress or load balancer instead, keep the backend ports reachable only on the trusted network.

For stronger placement guarantees on Kubernetes, also consider dashboard.topologySpreadConstraints and dp_manager.topologySpreadConstraints so replicas land on different worker nodes or availability zones.

On each CP host, run both the Dashboard and DP Manager containers.

Dashboard (dashboard-config.yaml):

dashboard-config.yaml
server:
  listen:
    host: "0.0.0.0"
    port: 7080
database:
  dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"

docker run -d --name api7-dashboard \
  -p 7080:7080 \
  -v $(pwd)/dashboard-config.yaml:/app/conf/config.yaml \
  api7/api7-ee-3-integrated:latest

DP Manager (dp-manager-config.yaml):

dp-manager-config.yaml
server:
  listen:
    host: "0.0.0.0"
    port: 7900
database:
  dsn: "postgres://api7ee:$DB_PASSWORD@your-pg-ha-endpoint:5432/api7ee"

docker run -d --name api7-dp-manager \
  -p 7900:7900 \
  -v $(pwd)/dp-manager-config.yaml:/app/conf/config.yaml \
  api7/api7-ee-dp-manager:latest

Deploy a load balancer (for example, NGINX or HAProxy) in front of the CP hosts to distribute traffic across Dashboard and DP Manager instances.

Configure the load balancer with passive or active health checks and ensure that the backend pool includes every CP node before you point users or data plane nodes to it.

tip

All Dashboard and DP Manager instances must connect to the same PostgreSQL database. PostgreSQL HA (primary-replica, Patroni, or managed services like Amazon RDS, Azure Database, or Google Cloud SQL) is a separate concern and should be configured according to your database provider's documentation.

Data Plane HA

Data plane nodes are stateless — they receive configuration from the control plane and process traffic independently. Deploy multiple nodes behind a load balancer.

For predictable failover behavior, configure the data plane so that:

Each node belongs to the same gateway group.
The load balancer only sends traffic to healthy nodes.
Nodes are distributed across failure domains where possible, such as different hosts, zones, or Kubernetes worker nodes.

Health Check Configuration

Each gateway node exposes a status endpoint for health monitoring:

Endpoint	Method	Description
`/status`	GET	Returns 200 if the gateway is running
`/status/ready`	GET	Returns 200 if the gateway is ready to accept traffic

The status endpoint listens on port 7085 by default. Configure your load balancer to perform health checks against this endpoint at regular intervals (for example, every 10–30 seconds).

Use /status when you only need to know that the gateway process is alive. Use /status/ready when you want the load balancer to route traffic only to nodes that still have an available DP Manager connection.

note

If your operating model allows data plane nodes to continue serving cached configuration during a brief control plane interruption, health checking /status keeps those nodes in rotation. If you instead want the load balancer to stop routing traffic when DP Manager connectivity is lost, use /status/ready. See Configure Readiness and Liveness Probes for the detailed tradeoff.

Deploy Multiple DP Nodes

Kubernetes (Helm)
Docker

Add these HA-oriented settings to the values.yaml file you use for the api7/gateway chart:

gateway-values.yaml
api7ee:
  status_endpoint:
    enabled: true
    ip: 0.0.0.0
    port: 7085

apisix:
  replicaCount: 3

  podDisruptionBudget:
    enabled: true
    minAvailable: 1

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                    - gateway
            topologyKey: kubernetes.io/hostname

gateway:
  readinessProbe:
    httpGet:
      path: /status/ready
      port: 7085
  livenessProbe:
    httpGet:
      path: /status
      port: 7085

❶ Expose the status endpoints on port 7085 so Kubernetes and external health checkers can verify node state.

❷ Run at least 2 data plane replicas for HA; start with 3 if you want more headroom during maintenance.

❸ Use a PodDisruptionBudget so voluntary disruptions do not evict all gateway pods at once.

❹ Spread pods across worker nodes to reduce the impact of a single-node failure.

❺ Use separate readiness and liveness checks so the scheduler and load balancers stop sending traffic to nodes that are not ready.

Choose the service exposure model that matches your environment. On managed Kubernetes, setting gateway.type: LoadBalancer is a common way to provision the data plane frontend. If you already have an ingress controller or external L4/L7 load balancer, keep the service type that fits that design and point the frontend at the gateway Service.

Run gateway containers on multiple hosts. Each connects to the DP Manager for configuration:

docker run -d --name api7-gateway \
  -p 9080:9080 \
  -p 9443:9443 \
  -v $(pwd)/gateway-config.yaml:/usr/local/apisix/conf/config.yaml \
  api7/api7-ee-3-gateway:latest

Deploy a load balancer in front of all gateway hosts. Configure health checks to poll http://<host>:7085/status or http://<host>:7085/status/ready and remove unhealthy nodes from the pool.

For details on health checks and resilience behavior, see Data Plane High Availability.

Data Plane Resilience (Fallback CP)

For environments that require the data plane to survive extended control plane outages, configure Fallback CP. This feature periodically exports all gateway configuration to external storage (AWS S3 or Azure Blob Storage), enabling data plane nodes to fetch configuration from storage when the control plane is unreachable.

For detailed setup instructions, see Data Plane Resilience.

Verification

After deploying the HA setup, verify each component:

Control Plane: Access the Dashboard through the load balancer URL. Confirm that both instances are serving requests by checking access logs on each node.
Data Plane: Send a test request through the DP load balancer:
```
curl -i "http://<dp-load-balancer>:9080/"
```

Health Checks: Verify the status endpoint on each DP node:

curl -i "http://<dp-node>:7085/status"
# Expected: HTTP/1.1 200 OK

Failover Test: Stop one CP node and verify the Dashboard remains accessible through the load balancer. Stop one DP node and verify API traffic continues to flow through the remaining nodes.

Operational Runbook

Use the following checks during acceptance testing and routine maintenance.

Control Plane Failover Test

Log in to the Dashboard through the control plane load balancer.
Stop or isolate one CP node.
Refresh the Dashboard and confirm that the session remains usable.
Apply a small configuration change, such as updating a route description, and confirm it succeeds.
Restore the failed node and confirm it rejoins the load balancer pool.

Data Plane Failover Test

Send repeated requests through the data plane load balancer.
Stop or drain one gateway node.
Confirm the load balancer health checks mark the node unhealthy.
Verify client traffic continues through the remaining gateway nodes without a full outage.
Restore the node and confirm it returns to service only after the health check passes.

Rolling Restart Procedure

Perform rolling maintenance one node at a time:

Confirm at least one other CP node and one other DP node are healthy before you restart anything.
Remove a single node from the load balancer pool or drain it gracefully.
Restart that node and wait until it is healthy again.
Verify Dashboard access or gateway traffic before moving to the next node.
Repeat for the remaining nodes.

Never restart all control plane nodes or all data plane nodes at the same time unless you have planned downtime.

Next Steps

Scale Data Plane: Adjust the number of data plane nodes based on traffic.
Autoscale Data Plane on Kubernetes: Use HPA to scale Kubernetes deployments automatically.
Data Plane Resilience: Configure fallback storage for extended CP outages.
Production Best Practices: Harden your deployment for production workloads.
System Requirements: Review hardware and software requirements.

Architecture Overview​

Prerequisites​

Deployment Checklist​

Minimum Hosts​

Hardware Requirements​

Network Ports​

Load Balancers and TLS​

Control Plane HA​

Data Plane HA​

Health Check Configuration​

Deploy Multiple DP Nodes​

Data Plane Resilience (Fallback CP)​

Verification​

Operational Runbook​

Control Plane Failover Test​

Data Plane Failover Test​

Rolling Restart Procedure​

Next Steps​