Version: 3.9.x

Autoscale API7 Gateway (K8s)

Autoscaling is a mechanism that automatically adjusts the resources available to the gateway, ensuring consistent performance under varying traffic loads.

This document guides you through setting CPU and memory requests, creating a Horizontal Pod Autoscaler (HPA) for API7 Gateway, and verifying scaling behavior with a simple load test, allowing your deployment to respond dynamically to workload changes.

The instructions below assume that API7 Gateway is already installed in a Kubernetes cluster.

Deploy a Metrics Server

The Horizontal Pod Autoscaler (HPA) requires CPU and memory metrics from the cluster. The Kubernetes Metrics Server collects these metrics from kubelets and exposes them through the Kubernetes API.

To deploy the Metrics Server, run:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Patch the Metrics Server deployment to allow insecure TLS connections to kubelets, which may be required in some cluster configurations:

kubectl patch deployment metrics-server \
  -n kube-system \
  --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value":"--kubelet-insecure-tls"}]'

To verify that the Metrics Server is functioning correctly, you can inspect the CPU and memory usage of pods within the current namespace:

kubectl top pods

Assuming that the gateway pod is running in the current namespace, you should see its CPU and memory metrics:

NAME                                 CPU(cores)   MEMORY(bytes)   
api7-ee-3-gateway-7998bf6dc6-2kh7q   16m          291Mi 

This confirms that metrics are being collected and available for the HPA.

Configure Resource Requests and Limits

To ensure proper resource allocation and enable HPA to scale API7 Gateway effectively, you should set CPU and memory requests and limits for the gateway pods.

To do so, export the current Helm values of your API7 Gateway release:

# update with your gateway name
helm get values api7-ee-3-gateway --all > values.yaml

Update the resources section to include CPU and memory requests and limits. This ensures the Kubernetes scheduler can allocate sufficient resources for each pod and that HPA can monitor pod usage accurately.

The values below are example settings. In production, you should adjust CPU and memory requests and limits based on your expected traffic, workload characteristics, and cluster resources.

values.yaml
apisix:
  resources:
    requests:
      cpu: "200m"       # Minimum CPU guaranteed (0.2 CPU cores)
      memory: "256Mi"   # Minimum memory guaranteed (256 MiB)
    limits:
      cpu: "1"          # Maximum CPU allowed (1 CPU core)
      memory: "512Mi"   # Maximum memory allowed (512 MiB)

Apply the updated configuration by upgrading the release:

# update with your gateway name
helm upgrade api7-ee-3-gateway api7/gateway -f values.yaml

Create an HPA

Create a HorizontalPodAutoscaler for the gateway deployment that automatically scales the number of pods between 2 and 10. Scaling is based on average CPU usage, with a target of 50%, to accommodate varying traffic loads. Adjust the number of replicas and the CPU target as needed to match your workload and cluster resources.

gateway-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api7-ee-3-gateway   # update with your gateway deployment name
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50

Apply the configuration to create the HPA:

kubectl apply -f gateway-hpa.yaml

View the current status of HPA in the cluster:

kubectl get hpa

You should see an output similar to the following:

NAME          REFERENCE                      TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 7%/50%          2         10        2          30s

Verify Autoscaling with Load Test

To verify autoscaling, you will start 20 load pods to generate traffic against the gateway. Alternatively, you can also use load-testing tools such as k6 or wrk.

# update with your gateway service name
for i in $(seq 1 20); do
  kubectl run load-$i --image=busybox:1.28 --restart=Never \
    --labels=app=loadgen \
    -- /bin/sh -c "while true; do wget -q -O- http://api7-ee-3-gateway-gateway/ >/dev/null 2>&1; done" &
done

In a separate terminal, watch the status of the HPA, including current metrics, replica count, and scaling activity in real time:

kubectl get hpa gateway-hpa -w

You should see the CPU utilization rising. When the load is sustained and high enough, HPA starts to scale up the gateway deployment:

NAME          REFERENCE                      TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 59%/50%   2         10        2          4m30s
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 37%/50%   2         10        3          7m
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 60%/50%   2         10        3          9m16s
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 62%/50%   2         10        4          9m46s
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 52%/50%   2         10        5          11m

To observe the deployment scale down, delete all load-generating pods:

kubectl delete pod -l app=loadgen

You should see the CPU utilization decreasing and gateway deployment scales down the number of replicas:

NAME          REFERENCE                      TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 6%/50%   2         10        6          17m
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 8%/50%   2         10        6          17m
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 9%/50%   2         10        5          17m
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 7%/50%   2         10        4          18m
gateway-hpa   Deployment/api7-ee-3-gateway   cpu: 7%/50%   2         10        2          18m

Next Steps

Kubernetes provides several autoscaling strategies beyond the CPU-based approach shown in this guide. Depending on your workload characteristics, API7 Gateway can be scaled based on memory utilization, custom metrics, or a combination of multiple metrics to achieve more accurate scaling decisions. Additionally, Kubernetes allows the configuration of scaling policies to control the rate of scale-up and scale-down events, which is important for maintaining stability in production environments.

For a full overview of supported metrics, scaling behaviors, and advanced configuration options, please consult the official Kubernetes documentation:

Deploy a Metrics Server​

Configure Resource Requests and Limits​

Create an HPA​

Verify Autoscaling with Load Test​

Next Steps​

Deploy a Metrics Server

Configure Resource Requests and Limits

Create an HPA

Verify Autoscaling with Load Test

Next Steps