Version: 3.10.x

Performance Benchmark

This page documents the published benchmark results for API7 Gateway and the methodology behind them, so you can both understand the numbers and reproduce them in your own environment. All test assets — ADC configurations, wrk2 scripts, and deployment manifests — are available in the api7-gateway-performance-benchmark repository.

Methodology

The benchmark uses a fixed set of variables so that results are comparable across test runs and environments:

Single gateway node: one API7 Gateway data plane instance, with worker_processes configured to match the vCPU count of the host. To build a clean single-core baseline, start with worker_processes: 1 and scale up once you have confirmed the single-core numbers.
Exclude upstream interference for the ceiling number: enable only the mocking plugin to measure API7 Gateway's raw request-processing ceiling. The mocking plugin returns canned responses without forwarding to an upstream. This number is not comparable to real-world throughput.
Realistic scenarios forward to an upstream: every other test row uses a real NGINX upstream running nginx/1.25.4, configured with access_log off and a static 200-byte response so that the upstream itself is never the bottleneck.
Sample size: each test case runs 5 times, 2 minutes per run, and the reported result is the average of the 5 runs.
Test scenarios: 5 plugin combinations × 2 route/consumer scales = 10 test cases:
1. Only mocking (baseline ceiling)
2. No plugins
3. Only limit-count
4. Only key-auth
5. Both key-auth and limit-count

Each plugin scenario is run with (a) 1 route / 1 consumer and (b) 100 routes / 100 consumers to measure the impact of resource count on matching and plugin evaluation.

Published results — AWS EKS

These results were obtained in an AWS EKS environment using c5.4xlarge EC2 instances (16 vCPUs, 32 GB RAM) running Amazon Linux 2 (AL2_x86_64) on Kubernetes 1.29. API7 Gateway, the NGINX upstream, and wrk2 each ran on separate nodes in the same VPC to avoid resource contention.

note

The Forward to Upstream column distinguishes the mocking-only ceiling (where API7 Gateway returns canned responses and no request leaves the gateway) from realistic proxy scenarios. The mocking-only row measures the theoretical ceiling of the gateway's request-processing pipeline; it is not comparable to real-world throughput, where the backend and network become additional factors. Use the rows with Forward to Upstream: True when sizing for production.

AWS EKS (c5.4xlarge)
Single-host baseline (1 worker)

Test scenario	Routes / Consumers	Forward to upstream	QPS	P99 (ms)	P95 (ms)
Only `mocking`	1 route, 0 consumers	False	310,392.07	1.16	1.08
No plugins	1 route, 0 consumers	True	167,019.37	2.30	2.16
No plugins	100 routes, 0 consumers	True	162,753.17	2.31	2.16
Only `limit-count`	1 route, 0 consumers	True	145,370.10	2.43	2.24
Only `limit-count`	100 routes, 0 consumers	True	143,108.40	2.45	2.25
Only `key-auth`	1 route, 1 consumer	True	147,869.49	2.41	2.22
Only `key-auth`	100 routes, 100 consumers	True	145,070.93	2.43	2.25
`key-auth` + `limit-count`	1 route, 1 consumer	True	136,725.47	2.43	2.26
`key-auth` + `limit-count`	100 routes, 100 consumers	True	133,782.95	2.48	2.30

This baseline is taken with API7 Gateway (worker_processes: 1), the NGINX upstream, and wrk2 all running on the same machine using the host network. Its purpose is to isolate single-core throughput with no network latency.

Test scenario	Routes / Consumers	QPS	P99 (ms)	P95 (ms)
No plugins	1 route, 0 consumers	24,129.22	0.093	0.082
No plugins	100 routes, 0 consumers	23,652.91	0.096	0.084
Only `limit-count`	1 route, 0 consumers	20,495.10	0.104	0.092
Only `limit-count`	100 routes, 0 consumers	20,462.31	0.104	0.094
Only `key-auth`	1 route, 1 consumer	21,019.04	0.100	0.089
Only `key-auth`	100 routes, 100 consumers	20,444.81	0.109	0.095
`key-auth` + `limit-count`	1 route, 1 consumer	18,940.39	0.110	0.097
`key-auth` + `limit-count`	100 routes, 100 consumers	18,193.88	0.110	0.098

Single-host and multi-node numbers are not directly comparable — the multi-node AWS EKS setup scales across 16 vCPUs while the single-host baseline constrains API7 Gateway to a single worker_process. Use the single-host table to verify that your own single-core results are roughly in line with the reference before you scale up.

Run your own benchmark

To reproduce these numbers or measure your own workload, follow the same methodology used for the published results and apply the optimization guidance below.

Before you start

Select an appropriate number of gateway nodes. Use one API7 Gateway node per test. Configure worker_processes to match the vCPU count of the host. Do not run multiple API7 Gateway instances with small worker_processes values in parallel — the results are harder to interpret.
Build the single-core baseline first. Set worker_processes: 1 and run the "No plugins" scenario against a local NGINX upstream on the same host. Your result should be in the same ballpark as the single-host baseline table above before you scale out.
Exclude upstream interference for the ceiling. Enable only the mocking plugin for the ceiling number. This isolates API7 Gateway's own processing cost.
Watch the upstream. During every realistic test, monitor the CPU, memory, and event-loop utilization of the NGINX upstream. If the upstream is saturated, the numbers reflect the upstream's limit, not API7 Gateway's.
Run multiple samples and apply statistics. Every test case should run at least 5 times. Report the mean and standard deviation so that noise can be separated from signal.

For a full AWS EKS walkthrough — cluster creation, node labeling, Helm install, NGINX upstream deployment, and wrk2 deployment — see Run Benchmarks on AWS EKS.

Optimization recommendations

Raise the maximum number of open files

Check the system's current maximum number of open file descriptors:

cat /proc/sys/fs/file-nr

The last number is the system-wide maximum. If it is too small, raise it in /etc/sysctl.conf:

/etc/sysctl.conf
fs.file-max = 1020000
net.ipv4.ip_conntrack_max = 1020000
net.ipv4.netfilter.ip_conntrack_max = 1020000

Reload:

sudo sysctl -p /etc/sysctl.conf

Raise the per-process `ulimit`

Each incoming connection consumes a file descriptor. Raise ulimit -n to a seven-figure value so that the gateway can accept the connection volume a benchmark generates.

Temporary (current shell only):

ulimit -n 1024000

Permanent (edit /etc/security/limits.conf):

/etc/security/limits.conf
* hard nofile 1024000
* soft nofile 1024000

Disable access logs

Access logs write to disk for every request, which can cap QPS at I/O speeds during high-volume tests. Disable them for benchmarks:

config.yaml
nginx_config:
  error_log_level: error
  worker_processes: auto
  http:
    enable_access_log: false

Setting the error log level to error also reduces log I/O during the run.

Avoid resource contention

Put wrk2, API7 Gateway, and the upstream service on separate machines in the same local network. When running in Kubernetes, use nodeSelector (or taints and tolerations) so that the three pods land on three different nodes, otherwise CPU and network contention will distort the results.

Avoid burstable cloud instances

Do not benchmark on burstable cloud instance families (for example, AWS t3/t4g). Burstable instances have a credit-based CPU model that produces non-repeatable results. Use a fixed-performance family like c5.4xlarge instead.

Also note that cloud providers' vCPUs are not always 1:1 with physical cores — many use hyper-threading, which means the actual physical core count can be half of the vCPU count. For the most accurate sizing, consult your cloud provider's instance documentation (for AWS, see AWS Instance CPU options).

Watch for internal errors in the gateway

Before each benchmark run, tail the API7 Gateway error log and verify that it is clean. An internal error loop (for example, an upstream DNS failure that triggers retries) will silently reduce measured throughput. Set the gateway log level to error and fix anything the log reports before measuring.

Verify connection capacity with `c1000k`

If your target is above a few hundred thousand concurrent connections, verify that the host kernel and file descriptor limits can actually sustain them. The c1000k tool is a small probe that can simulate one million concurrent connections:

# On the server node
./server 7000

# On the client node
./client <server-ip> 7000

Next steps

Run Benchmarks on AWS EKS — full walkthrough for reproducing the published AWS EKS results.
Configure the Data Plane — tune data plane settings for your workload.
Scale the Data Plane — scale horizontally and vertically in Kubernetes.
High Availability for the Data Plane — multi-replica resilience patterns.

Methodology​

Published results — AWS EKS​

Run your own benchmark​

Before you start​

Optimization recommendations​

Raise the maximum number of open files​

Raise the per-process ulimit​

Disable access logs​

Avoid resource contention​

Avoid burstable cloud instances​

Watch for internal errors in the gateway​

Verify connection capacity with c1000k​

Next steps​