Performance Benchmark
This page documents the published benchmark results for API7 Gateway and the methodology behind them, so you can both understand the numbers and reproduce them in your own environment. All test assets — ADC configurations, wrk2 scripts, and deployment manifests — are available in the api7-gateway-performance-benchmark repository.
Methodology
The benchmark uses a fixed set of variables so that results are comparable across test runs and environments:
- Single gateway node: one API7 Gateway data plane instance, with
worker_processesconfigured to match the vCPU count of the host. To build a clean single-core baseline, start withworker_processes: 1and scale up once you have confirmed the single-core numbers. - Exclude upstream interference for the ceiling number: enable only the
mockingplugin to measure API7 Gateway's raw request-processing ceiling. The mocking plugin returns canned responses without forwarding to an upstream. This number is not comparable to real-world throughput. - Realistic scenarios forward to an upstream: every other test row uses a real NGINX upstream running
nginx/1.25.4, configured withaccess_log offand a static 200-byte response so that the upstream itself is never the bottleneck. - Sample size: each test case runs 5 times, 2 minutes per run, and the reported result is the average of the 5 runs.
- Test scenarios: 5 plugin combinations × 2 route/consumer scales = 10 test cases:
- Only
mocking(baseline ceiling) - No plugins
- Only
limit-count - Only
key-auth - Both
key-authandlimit-count
- Only
Each plugin scenario is run with (a) 1 route / 1 consumer and (b) 100 routes / 100 consumers to measure the impact of resource count on matching and plugin evaluation.
Published results — AWS EKS
These results were obtained in an AWS EKS environment using c5.4xlarge EC2 instances (16 vCPUs, 32 GB RAM) running Amazon Linux 2 (AL2_x86_64) on Kubernetes 1.29. API7 Gateway, the NGINX upstream, and wrk2 each ran on separate nodes in the same VPC to avoid resource contention.
The Forward to Upstream column distinguishes the mocking-only ceiling (where API7 Gateway returns canned responses and no request leaves the gateway) from realistic proxy scenarios. The mocking-only row measures the theoretical ceiling of the gateway's request-processing pipeline; it is not comparable to real-world throughput, where the backend and network become additional factors. Use the rows with Forward to Upstream: True when sizing for production.
- AWS EKS (c5.4xlarge)
- Single-host baseline (1 worker)
| Test scenario | Routes / Consumers | Forward to upstream | QPS | P99 (ms) | P95 (ms) |
|---|---|---|---|---|---|
Only mocking | 1 route, 0 consumers | False | 310,392.07 | 1.16 | 1.08 |
| No plugins | 1 route, 0 consumers | True | 167,019.37 | 2.30 | 2.16 |
| No plugins | 100 routes, 0 consumers | True | 162,753.17 | 2.31 | 2.16 |
Only limit-count | 1 route, 0 consumers | True | 145,370.10 | 2.43 | 2.24 |
Only limit-count | 100 routes, 0 consumers | True | 143,108.40 | 2.45 | 2.25 |
Only key-auth | 1 route, 1 consumer | True | 147,869.49 | 2.41 | 2.22 |
Only key-auth | 100 routes, 100 consumers | True | 145,070.93 | 2.43 | 2.25 |
key-auth + limit-count | 1 route, 1 consumer | True | 136,725.47 | 2.43 | 2.26 |
key-auth + limit-count | 100 routes, 100 consumers | True | 133,782.95 | 2.48 | 2.30 |
This baseline is taken with API7 Gateway (worker_processes: 1), the NGINX upstream, and wrk2 all running on the same machine using the host network. Its purpose is to isolate single-core throughput with no network latency.
| Test scenario | Routes / Consumers | QPS | P99 (ms) | P95 (ms) |
|---|---|---|---|---|
| No plugins | 1 route, 0 consumers | 24,129.22 | 0.093 | 0.082 |
| No plugins | 100 routes, 0 consumers | 23,652.91 | 0.096 | 0.084 |
Only limit-count | 1 route, 0 consumers | 20,495.10 | 0.104 | 0.092 |
Only limit-count | 100 routes, 0 consumers | 20,462.31 | 0.104 | 0.094 |
Only key-auth | 1 route, 1 consumer | 21,019.04 | 0.100 | 0.089 |
Only key-auth | 100 routes, 100 consumers | 20,444.81 | 0.109 | 0.095 |
key-auth + limit-count | 1 route, 1 consumer | 18,940.39 | 0.110 | 0.097 |
key-auth + limit-count | 100 routes, 100 consumers | 18,193.88 | 0.110 | 0.098 |
Single-host and multi-node numbers are not directly comparable — the multi-node AWS EKS setup scales across 16 vCPUs while the single-host baseline constrains API7 Gateway to a single worker_process. Use the single-host table to verify that your own single-core results are roughly in line with the reference before you scale up.
Run your own benchmark
To reproduce these numbers or measure your own workload, follow the same methodology used for the published results and apply the optimization guidance below.
Before you start
- Select an appropriate number of gateway nodes. Use one API7 Gateway node per test. Configure
worker_processesto match the vCPU count of the host. Do not run multiple API7 Gateway instances with smallworker_processesvalues in parallel — the results are harder to interpret. - Build the single-core baseline first. Set
worker_processes: 1and run the "No plugins" scenario against a local NGINX upstream on the same host. Your result should be in the same ballpark as the single-host baseline table above before you scale out. - Exclude upstream interference for the ceiling. Enable only the
mockingplugin for the ceiling number. This isolates API7 Gateway's own processing cost. - Watch the upstream. During every realistic test, monitor the CPU, memory, and event-loop utilization of the NGINX upstream. If the upstream is saturated, the numbers reflect the upstream's limit, not API7 Gateway's.
- Run multiple samples and apply statistics. Every test case should run at least 5 times. Report the mean and standard deviation so that noise can be separated from signal.
For a full AWS EKS walkthrough — cluster creation, node labeling, Helm install, NGINX upstream deployment, and wrk2 deployment — see Run Benchmarks on AWS EKS.
Optimization recommendations
Raise the maximum number of open files
Check the system's current maximum number of open file descriptors:
cat /proc/sys/fs/file-nr
The last number is the system-wide maximum. If it is too small, raise it in /etc/sysctl.conf:
fs.file-max = 1020000
net.ipv4.ip_conntrack_max = 1020000
net.ipv4.netfilter.ip_conntrack_max = 1020000
Reload:
sudo sysctl -p /etc/sysctl.conf
Raise the per-process ulimit
Each incoming connection consumes a file descriptor. Raise ulimit -n to a seven-figure value so that the gateway can accept the connection volume a benchmark generates.
Temporary (current shell only):
ulimit -n 1024000
Permanent (edit /etc/security/limits.conf):
* hard nofile 1024000
* soft nofile 1024000
Disable access logs
Access logs write to disk for every request, which can cap QPS at I/O speeds during high-volume tests. Disable them for benchmarks:
nginx_config:
error_log_level: error
worker_processes: auto
http:
enable_access_log: false
Setting the error log level to error also reduces log I/O during the run.
Avoid resource contention
Put wrk2, API7 Gateway, and the upstream service on separate machines in the same local network. When running in Kubernetes, use nodeSelector (or taints and tolerations) so that the three pods land on three different nodes, otherwise CPU and network contention will distort the results.
Avoid burstable cloud instances
Do not benchmark on burstable cloud instance families (for example, AWS t3/t4g). Burstable instances have a credit-based CPU model that produces non-repeatable results. Use a fixed-performance family like c5.4xlarge instead.
Also note that cloud providers' vCPUs are not always 1:1 with physical cores — many use hyper-threading, which means the actual physical core count can be half of the vCPU count. For the most accurate sizing, consult your cloud provider's instance documentation (for AWS, see AWS Instance CPU options).
Watch for internal errors in the gateway
Before each benchmark run, tail the API7 Gateway error log and verify that it is clean. An internal error loop (for example, an upstream DNS failure that triggers retries) will silently reduce measured throughput. Set the gateway log level to error and fix anything the log reports before measuring.
Verify connection capacity with c1000k
If your target is above a few hundred thousand concurrent connections, verify that the host kernel and file descriptor limits can actually sustain them. The c1000k tool is a small probe that can simulate one million concurrent connections:
# On the server node
./server 7000
# On the client node
./client <server-ip> 7000
Next steps
- Run Benchmarks on AWS EKS — full walkthrough for reproducing the published AWS EKS results.
- Configure the Data Plane — tune data plane settings for your workload.
- Scale the Data Plane — scale horizontally and vertically in Kubernetes.
- High Availability for the Data Plane — multi-replica resilience patterns.