Version: 3.9.x

Troubleshoot API7 Gateway

This guide helps you diagnose and resolve common issues encountered during the installation, configuration, and operation of API7 Gateway. For each issue, the guide provides symptoms, possible causes, and resolution steps.

Prerequisites

A token from the Dashboard.

Diagnostic Tools

Before troubleshooting specific issues, familiarize yourself with the diagnostic tools available.

Check Component Status

Kubernetes

# Check all API7 pods
kubectl get pods -n api7 -o wide

# Check pod events for errors
kubectl describe pod {pod_name} -n api7

# Check pod logs
kubectl logs {pod_name} -n api7 --tail=100

Docker

# Check container status
docker ps -a --filter "name=api7"

# Check container logs
docker logs {container_name} --tail=100

Check Data Plane Error Logs

# Kubernetes
kubectl exec -n api7 {gateway_pod} -- tail -f /usr/local/apisix/logs/error.log

# Docker
docker exec {gateway_container} tail -f /usr/local/apisix/logs/error.log

Check Control Plane Logs

# Kubernetes
kubectl logs -n api7 {cp_pod} --tail=100

# Docker
docker logs {cp_container} --tail=100

Test Connectivity

# Test gateway HTTP port
curl -v "http://{GATEWAY_HOST}:9080/"

# Test gateway HTTPS port
curl -v "https://{GATEWAY_HOST}:9443/" --insecure

# Test Dashboard access
curl -v "https://{DASHBOARD_HOST}:7443/" --insecure

Control Plane Issues

Dashboard Is Not Accessible

Symptoms: Cannot reach the Dashboard web interface at https://{host}:7443.

Possible Causes and Resolutions:

Pod/container is not running:

# Kubernetes
kubectl get pods -n api7 -l app.kubernetes.io/component=dashboard

# Docker
docker ps -a --filter "name=dashboard"

If the pod is in CrashLoopBackOff or the container has exited, check the logs for startup errors.

Port not exposed or blocked by firewall:
- Verify the service is listening: kubectl get svc -n api7 | grep dashboard
- Check that port 7443 is not blocked by firewall or security group rules.
- For local access, use port forwarding: kubectl port-forward svc/api7ee3-dashboard 7443:7443 -n api7
TLS certificate issues: If using a custom TLS certificate, verify it is valid and correctly mounted.

PostgreSQL Connection Failure

Symptoms: Control Plane pods fail to start with database connection errors.

Possible Causes and Resolutions:

PostgreSQL is not running:

kubectl get pods -n api7 -l app.kubernetes.io/name=postgresql

Incorrect database credentials: Verify the database password in the Helm values or environment variables matches the PostgreSQL configuration.
Persistent volume issues: If PostgreSQL storage is full or the PVC is not bound:
```
kubectl get pvc -n api7
```

Configuration Changes Not Taking Effect

Symptoms: Changes made in the Dashboard do not appear on the Data Plane.

Possible Causes and Resolutions:

DP Manager connectivity: Verify the DP Manager service is running and accessible from the Data Plane:
```
kubectl get svc -n api7 | grep dp-manager
```
mTLS certificate expiry: Check if the mTLS certificates used between CP and DP have expired. Renew and redeploy if necessary.
Network policy blocking: Ensure no Kubernetes NetworkPolicy or firewall rules block traffic between CP and DP on ports 7900/7943.

Data Plane Issues

HTTP 502 Bad Gateway

Symptoms: Clients receive 502 Bad Gateway responses from the gateway.

Possible Causes and Resolutions:

Upstream service is unreachable: Verify the upstream service is running and accessible from the gateway pod:
```
kubectl exec -n api7 {gateway_pod} -- curl -v "http://{upstream_host}:{upstream_port}/"
```
DNS resolution failure: If using service names, verify DNS resolution works inside the gateway pod:
```
kubectl exec -n api7 {gateway_pod} -- nslookup {upstream_service_name}
```

Upstream timeout: Check the error log for timeout messages. If the backend is slow, increase the upstream timeout. In API7 Gateway, upstream configuration lives on a service, so update the service with the new timeout. The timeout object accepts connect, send, and read values in seconds:

curl -k "https://localhost:7443/apisix/admin/services/{service_name}?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "{service_name}",
    "upstream": {
      "type": "roundrobin",
      "nodes": [
        {"host": "{upstream_host}", "port": {upstream_port}, "weight": 1}
      ],
      "timeout": {
        "connect": 30,
        "send": 30,
        "read": 30
      }
    }
  }'

The full service body (name + upstream) must be sent — PUT replaces the entire resource. Fetch the current service first with GET /apisix/admin/services/{service_name}?gateway_group_id={gateway_group_id} and merge the new timeout block before sending.

HTTP 503 Service Temporarily Unavailable

Symptoms: Gateway returns 503 for specific routes.

Possible Causes and Resolutions:

All upstream nodes are unhealthy: If active health checks are enabled and all nodes fail, the gateway returns 503. Check upstream health status in the Dashboard.
Circuit breaker triggered: If the api-breaker plugin is enabled, it may have opened the circuit after consecutive failures.

HTTP 404 Route Not Found

Symptoms: Requests return {"error_msg":"404 Route Not Found"}.

Possible Causes and Resolutions:

Route not published: Ensure the service template version containing the route is published to the correct Gateway Group.
Host mismatch: If the route has a host condition, ensure the Host header in the request matches exactly.
Path mismatch: Verify the request path matches the route's URI pattern. Check for trailing slashes.

High Latency

Symptoms: API response times are significantly higher than expected.

Possible Causes and Resolutions:

Insufficient worker processes: Check the worker_processes setting. It should match the number of available CPU cores:
config.yaml
```
nginx_config:
  worker_processes: auto
```
Plugin overhead: Disable non-essential plugins to isolate the cause. Enable plugins one by one to identify the bottleneck.
Access log I/O: High-traffic deployments may experience I/O bottleneck from access logging. Consider disabling or reducing access log verbosity:
config.yaml
```
nginx_config:
  http:
    enable_access_log: false
```
Resource contention: Check CPU and memory usage of the gateway pod. Increase resource limits if they are being throttled.

Certificate and TLS Issues

mTLS Handshake Failure Between CP and DP

Symptoms: Data Plane fails to connect to Control Plane. Logs show TLS handshake errors.

Possible Causes and Resolutions:

Certificate mismatch: Ensure the DP certificate was generated by the same CA that the CP trusts. Re-download the connection script from the Dashboard.

Certificate expired: Check certificate validity:

openssl x509 -in /path/to/tls.crt -noout -dates

Wrong CP address: Verify the etcd host configuration in the DP points to the correct DP Manager service address and port (7943 for mTLS).

SSL Certificate Not Working for Client Traffic

Symptoms: HTTPS requests to the gateway fail with certificate errors.

Possible Causes and Resolutions:

Certificate not uploaded: Upload the SSL certificate through the Dashboard or Admin API.
SNI mismatch: The certificate's Common Name or Subject Alternative Names must match the domain in the client request.
Certificate chain incomplete: Ensure the full certificate chain (including intermediate certificates) is provided.

Installation Issues

Helm Installation Fails

Symptoms: helm install or helm upgrade commands fail.

Possible Causes and Resolutions:

Helm repo not updated:
```
helm repo update
```
Insufficient cluster resources: Check if the cluster has enough CPU, memory, and storage to schedule all pods.
StorageClass not configured: If PostgreSQL or Prometheus require persistent storage, ensure a default StorageClass exists:
```
kubectl get storageclass
```

Image Pull Errors

Symptoms: Pods stuck in ImagePullBackOff or ErrImagePull.

Possible Causes and Resolutions:

Private registry authentication: If using a private registry, create an image pull secret:

kubectl create secret docker-registry api7-registry \
  --docker-server={REGISTRY_URL} \
  --docker-username={USERNAME} \
  --docker-password={PASSWORD} \
  -n api7

Air-gapped environment: Mirror the required images to your private registry. See Installation Packages.
Incorrect image tag: Verify the image tag exists in the registry.

Performance Issues

Gateway Not Achieving Expected QPS

Symptoms: Benchmark tests show lower QPS than documented baselines.

Possible Causes and Resolutions:

Review the Performance Benchmark guide for optimization recommendations.

Check system limits:

# Check open file limit
ulimit -n

# Should be at least 1024000 for benchmarking

Verify worker processes match CPU cores: Set worker_processes: auto or explicitly match the CPU core count.
Disable access logging during benchmarks: Access log I/O can reduce throughput.
Avoid burstable cloud instances: Use dedicated or compute-optimized instance types for consistent performance.

Getting Additional Help

If you cannot resolve an issue using this guide:

Check error logs for specific error messages and search the API7 documentation for the error.
Contact API7 Support with the following information:
- API7 Gateway version
- Deployment method (Kubernetes/Docker)
- Relevant error logs
- Steps to reproduce the issue
Visit the API7 Community for community support and discussions.

Prerequisites​

Diagnostic Tools​

Check Component Status​

Kubernetes​

Docker​

Check Data Plane Error Logs​

Check Control Plane Logs​

Test Connectivity​

Control Plane Issues​

Dashboard Is Not Accessible​

PostgreSQL Connection Failure​

Configuration Changes Not Taking Effect​

Data Plane Issues​

HTTP 502 Bad Gateway​

HTTP 503 Service Temporarily Unavailable​

HTTP 404 Route Not Found​

High Latency​

Certificate and TLS Issues​

mTLS Handshake Failure Between CP and DP​

SSL Certificate Not Working for Client Traffic​

Installation Issues​

Helm Installation Fails​

Image Pull Errors​

Performance Issues​

Gateway Not Achieving Expected QPS​

Getting Additional Help​

Prerequisites

Diagnostic Tools

Check Component Status

Kubernetes

Docker

Check Data Plane Error Logs

Check Control Plane Logs

Test Connectivity

Control Plane Issues

Dashboard Is Not Accessible

PostgreSQL Connection Failure

Configuration Changes Not Taking Effect

Data Plane Issues

HTTP 502 Bad Gateway

HTTP 503 Service Temporarily Unavailable

HTTP 404 Route Not Found

High Latency

Certificate and TLS Issues

mTLS Handshake Failure Between CP and DP

SSL Certificate Not Working for Client Traffic

Installation Issues

Helm Installation Fails

Image Pull Errors

Performance Issues

Gateway Not Achieving Expected QPS

Getting Additional Help