Troubleshoot API7 Gateway
This guide helps you diagnose and resolve common issues encountered during the installation, configuration, and operation of API7 Gateway. For each issue, the guide provides symptoms, possible causes, and resolution steps.
Diagnostic Tools
Before troubleshooting specific issues, familiarize yourself with the diagnostic tools available.
Check Component Status
Kubernetes
# Check all API7 pods
kubectl get pods -n api7 -o wide
# Check pod events for errors
kubectl describe pod {pod_name} -n api7
# Check pod logs
kubectl logs {pod_name} -n api7 --tail=100
Docker
# Check container status
docker ps -a --filter "name=api7"
# Check container logs
docker logs {container_name} --tail=100
Check Data Plane Error Logs
# Kubernetes
kubectl exec -n api7 {gateway_pod} -- tail -f /usr/local/apisix/logs/error.log
# Docker
docker exec {gateway_container} tail -f /usr/local/apisix/logs/error.log
Check Control Plane Logs
# Kubernetes
kubectl logs -n api7 {cp_pod} --tail=100
# Docker
docker logs {cp_container} --tail=100
Test Connectivity
# Test gateway HTTP port
curl -v "http://{GATEWAY_HOST}:9080/"
# Test gateway HTTPS port
curl -v "https://{GATEWAY_HOST}:9443/" --insecure
# Test Dashboard access
curl -v "https://{DASHBOARD_HOST}:7443/" --insecure
Control Plane Issues
Dashboard Is Not Accessible
Symptoms: Cannot reach the Dashboard web interface at https://{host}:7443.
Possible Causes and Resolutions:
-
Pod/container is not running:
# Kubernetes
kubectl get pods -n api7 -l app.kubernetes.io/component=dashboard
# Docker
docker ps -a --filter "name=dashboard"If the pod is in
CrashLoopBackOffor the container has exited, check the logs for startup errors. -
Port not exposed or blocked by firewall:
- Verify the service is listening:
kubectl get svc -n api7 | grep dashboard - Check that port 7443 is not blocked by firewall or security group rules.
- For local access, use port forwarding:
kubectl port-forward svc/api7ee3-dashboard 7443:7443 -n api7
- Verify the service is listening:
-
TLS certificate issues: If using a custom TLS certificate, verify it is valid and correctly mounted.
PostgreSQL Connection Failure
Symptoms: Control Plane pods fail to start with database connection errors.
Possible Causes and Resolutions:
-
PostgreSQL is not running:
kubectl get pods -n api7 -l app.kubernetes.io/name=postgresql -
Incorrect database credentials: Verify the database password in the Helm values or environment variables matches the PostgreSQL configuration.
-
Persistent volume issues: If PostgreSQL storage is full or the PVC is not bound:
kubectl get pvc -n api7
Configuration Changes Not Taking Effect
Symptoms: Changes made in the Dashboard do not appear on the Data Plane.
Possible Causes and Resolutions:
-
DP Manager connectivity: Verify the DP Manager service is running and accessible from the Data Plane:
kubectl get svc -n api7 | grep dp-manager -
mTLS certificate expiry: Check if the mTLS certificates used between CP and DP have expired. Renew and redeploy if necessary.
-
Network policy blocking: Ensure no Kubernetes NetworkPolicy or firewall rules block traffic between CP and DP on ports 7900/7943.
Data Plane Issues
HTTP 502 Bad Gateway
Symptoms: Clients receive 502 Bad Gateway responses from the gateway.
Possible Causes and Resolutions:
-
Upstream service is unreachable: Verify the upstream service is running and accessible from the gateway pod:
kubectl exec -n api7 {gateway_pod} -- curl -v "http://{upstream_host}:{upstream_port}/" -
DNS resolution failure: If using service names, verify DNS resolution works inside the gateway pod:
kubectl exec -n api7 {gateway_pod} -- nslookup {upstream_service_name} -
Upstream timeout: Check the error log for timeout messages. If the backend is slow, increase the upstream timeout. In API7 Gateway, upstream configuration lives on a service, so update the service with the new timeout. The
timeoutobject acceptsconnect,send, andreadvalues in seconds:curl -k "https://localhost:7443/apisix/admin/services/{service_name}?gateway_group_id={group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "{service_name}",
"upstream": {
"type": "roundrobin",
"nodes": [
{"host": "{upstream_host}", "port": {upstream_port}, "weight": 1}
],
"timeout": {
"connect": 30,
"send": 30,
"read": 30
}
}
}'The full service body (name + upstream) must be sent — PUT replaces the entire resource. Fetch the current service first with
GET /apisix/admin/services/{service_name}?gateway_group_id={group_id}and merge the newtimeoutblock before sending.
HTTP 503 Service Temporarily Unavailable
Symptoms: Gateway returns 503 for specific routes.
Possible Causes and Resolutions:
-
All upstream nodes are unhealthy: If active health checks are enabled and all nodes fail, the gateway returns 503. Check upstream health status in the Dashboard.
-
Circuit breaker triggered: If the
api-breakerplugin is enabled, it may have opened the circuit after consecutive failures.
HTTP 404 Route Not Found
Symptoms: Requests return {"error_msg":"404 Route Not Found"}.
Possible Causes and Resolutions:
-
Route not published: Ensure the service template version containing the route is published to the correct Gateway Group.
-
Host mismatch: If the route has a
hostcondition, ensure theHostheader in the request matches exactly. -
Path mismatch: Verify the request path matches the route's URI pattern. Check for trailing slashes.
High Latency
Symptoms: API response times are significantly higher than expected.
Possible Causes and Resolutions:
-
Insufficient worker processes: Check the
worker_processessetting. It should match the number of available CPU cores:config.yamlnginx_config:
worker_processes: auto -
Plugin overhead: Disable non-essential plugins to isolate the cause. Enable plugins one by one to identify the bottleneck.
-
Access log I/O: High-traffic deployments may experience I/O bottleneck from access logging. Consider disabling or reducing access log verbosity:
config.yamlnginx_config:
http:
enable_access_log: false -
Resource contention: Check CPU and memory usage of the gateway pod. Increase resource limits if they are being throttled.
Certificate and TLS Issues
mTLS Handshake Failure Between CP and DP
Symptoms: Data Plane fails to connect to Control Plane. Logs show TLS handshake errors.
Possible Causes and Resolutions:
-
Certificate mismatch: Ensure the DP certificate was generated by the same CA that the CP trusts. Re-download the connection script from the Dashboard.
-
Certificate expired: Check certificate validity:
openssl x509 -in /path/to/tls.crt -noout -dates -
Wrong CP address: Verify the etcd host configuration in the DP points to the correct DP Manager service address and port (7943 for mTLS).
SSL Certificate Not Working for Client Traffic
Symptoms: HTTPS requests to the gateway fail with certificate errors.
Possible Causes and Resolutions:
-
Certificate not uploaded: Upload the SSL certificate through the Dashboard or Admin API.
-
SNI mismatch: The certificate's Common Name or Subject Alternative Names must match the domain in the client request.
-
Certificate chain incomplete: Ensure the full certificate chain (including intermediate certificates) is provided.
Installation Issues
Helm Installation Fails
Symptoms: helm install or helm upgrade commands fail.
Possible Causes and Resolutions:
-
Helm repo not updated:
helm repo update -
Insufficient cluster resources: Check if the cluster has enough CPU, memory, and storage to schedule all pods.
-
StorageClass not configured: If PostgreSQL or Prometheus require persistent storage, ensure a default StorageClass exists:
kubectl get storageclass
Image Pull Errors
Symptoms: Pods stuck in ImagePullBackOff or ErrImagePull.
Possible Causes and Resolutions:
-
Private registry authentication: If using a private registry, create an image pull secret:
kubectl create secret docker-registry api7-registry \
--docker-server={REGISTRY_URL} \
--docker-username={USERNAME} \
--docker-password={PASSWORD} \
-n api7 -
Air-gapped environment: Mirror the required images to your private registry. See Installation Packages.
-
Incorrect image tag: Verify the image tag exists in the registry.
Performance Issues
Gateway Not Achieving Expected QPS
Symptoms: Benchmark tests show lower QPS than documented baselines.
Possible Causes and Resolutions:
-
Review the Performance Benchmark guide for optimization recommendations.
-
Check system limits:
# Check open file limit
ulimit -n
# Should be at least 1024000 for benchmarking -
Verify worker processes match CPU cores: Set
worker_processes: autoor explicitly match the CPU core count. -
Disable access logging during benchmarks: Access log I/O can reduce throughput.
-
Avoid burstable cloud instances: Use dedicated or compute-optimized instance types for consistent performance.
Getting Additional Help
If you cannot resolve an issue using this guide:
- Check error logs for specific error messages and search the API7 documentation for the error.
- Contact API7 Support with the following information:
- API7 Gateway version
- Deployment method (Kubernetes/Docker)
- Relevant error logs
- Steps to reproduce the issue
- Visit the API7 Community for community support and discussions.