Configure Upstream Health Checks
Health checks are crucial for maintaining the availability of your services. By monitoring the health of upstream nodes, the Gateway can automatically remove unhealthy nodes from the load-balancing pool and redirect traffic only to healthy ones.
API7 Enterprise supports two types of health checks:
- Active Health Check: The Gateway periodically sends probe requests to upstream nodes to verify their status.
- Passive Health Check: The Gateway monitors the responses of real client requests to detect failures.
Prerequisites
- An API7 Enterprise instance is running.
- A Gateway Group is created and a Gateway instance is running.
- A token from the Dashboard.
Start Sample Upstream Services
For local validation, start two sample upstream services on the host:
docker run -d --name tm-health-1 -p 18083:80 kennethreitz/httpbin
docker run -d --name tm-health-2 -p 18084:80 kennethreitz/httpbin
The examples below assume the gateway can reach the host on 192.168.215.1.
Workflow Overview
The Gateway sends active probes and monitors real traffic to detect unhealthy nodes. Traffic is only routed to nodes that are marked as healthy.
Configure Active Health Check
Active health checks allow the Gateway to detect failures even when there is no client traffic.
- Admin API
- ADC
# 1. Create a Service with two upstream nodes and active checks
curl -k "https://localhost:7443/apisix/admin/services/upstream-with-health-check?gateway_group_id={group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "upstream-with-health-check",
"upstream": {
"type": "roundrobin",
"nodes": [
{
"host": "192.168.215.1",
"port": 18083,
"weight": 100
},
{
"host": "192.168.215.1",
"port": 18084,
"weight": 100
}
],
"checks": {
"active": {
"type": "http",
"http_path": "/status/200",
"timeout": 1,
"concurrency": 10,
"healthy": {
"interval": 2,
"successes": 1,
"http_statuses": [200]
},
"unhealthy": {
"interval": 1,
"tcp_failures": 2,
"timeouts": 3,
"http_failures": 3,
"http_statuses": [429, 404, 500, 502, 503, 504]
}
}
}
}
}'
# 2. Create a Route for validation traffic
curl -k "https://localhost:7443/apisix/admin/routes/health-check-route?gateway_group_id={group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "health-check-route",
"methods": ["GET"],
"paths": ["/status/200"],
"service_id": "upstream-with-health-check"
}'
services:
- name: upstream-with-health-check
upstream:
type: roundrobin
nodes:
- host: 192.168.215.1
port: 18083
weight: 1
- host: 192.168.215.1
port: 18084
weight: 1
checks:
active:
type: http
http_path: /status/200
timeout: 1
concurrency: 10
healthy:
interval: 2
successes: 1
http_statuses:
- 200
unhealthy:
interval: 1
tcp_failures: 2
timeouts: 3
http_failures: 3
http_statuses:
- 429
- 404
- 500
- 502
- 503
- 504
routes:
- name: health-check-route
uris:
- /status/200
methods:
- GET
adc sync -f adc.yaml
Configure Passive Health Check
Passive health checks complement active ones by monitoring real-world traffic patterns to detect failures that active probes might miss.
- Admin API
- ADC
# Update the Service to add passive checks alongside active checks
curl -k "https://localhost:7443/apisix/admin/services/upstream-with-health-check?gateway_group_id={group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "upstream-with-health-check",
"upstream": {
"type": "roundrobin",
"nodes": [
{
"host": "192.168.215.1",
"port": 18083,
"weight": 100
},
{
"host": "192.168.215.1",
"port": 18084,
"weight": 100
}
],
"checks": {
"active": {
"type": "http",
"http_path": "/status/200",
"healthy": {
"interval": 30,
"successes": 1,
"http_statuses": [200]
},
"unhealthy": {
"interval": 30,
"http_failures": 3,
"tcp_failures": 2,
"timeouts": 3,
"http_statuses": [429, 404, 500, 502, 503, 504]
}
},
"passive": {
"healthy": {
"http_statuses": [200, 201],
"successes": 1
},
"unhealthy": {
"http_statuses": [404, 500, 502, 503],
"http_failures": 3,
"tcp_failures": 2,
"timeouts": 3
}
}
}
}
}'
services:
- name: upstream-with-health-check
upstream:
type: roundrobin
nodes:
- host: 192.168.215.1
port: 18083
weight: 1
- host: 192.168.215.1
port: 18084
weight: 1
checks:
active:
type: http
http_path: /status/200
healthy:
interval: 30
successes: 1
http_statuses:
- 200
unhealthy:
interval: 30
http_failures: 3
tcp_failures: 2
timeouts: 3
http_statuses:
- 429
- 404
- 500
- 502
- 503
- 504
passive:
healthy:
http_statuses:
- 200
- 201
successes: 1
unhealthy:
http_statuses:
- 404
- 500
- 502
- 503
http_failures: 3
tcp_failures: 2
timeouts: 3
routes:
- name: health-check-route
uris:
- /status/200
methods:
- GET
adc sync -f adc.yaml
Passive health checks are reactive and require some amount of client traffic to detect failures.
Combine Active and Passive Health Checks
For maximum reliability, it is recommended to use both active and passive health checks together.
Validate the Configuration
- Simulate a failure: Stop one of your backend upstream nodes, for example
docker stop tm-health-1. - Observe the Gateway: Wait for the health check to detect the failure based on the configured thresholds.
- Check availability: Send requests to the route. The Gateway should continue to return
200 OKwhile at least one node is healthy. - Inspect health state: Query the Control API from the gateway container:
docker exec gateway sh -lc 'curl -s http://127.0.0.1:9090/v1/healthcheck || wget -qO- http://127.0.0.1:9090/v1/healthcheck'
In the local validation environment, the health state was reported under the upstream path, such as /apisix/upstreams/upstream-with-health-check.
while true; do curl -s -o /dev/null -w '%{http_code}\n' "http://127.0.0.1:9080/status/200"; sleep 1; done
Troubleshooting
- Health Checks Not Working: Check that the
http_pathexists on your backend and that thehostandportare correctly configured. - Firewall Issues: Ensure the Gateway nodes have permission to reach the upstream nodes' health check endpoints.
- Wait Time: If a node is not being removed immediately, check your
intervalandhttp_failures/successessettings. - DNS Issues: If you are using hostnames in your upstream nodes, ensure the Gateway nodes can resolve them.
Next Steps
- Upstreams and Load Balancing — understand load balancing algorithms and upstream configuration.
- Scale Data Plane — scale gateway nodes for high availability.