Configure Alerts
API7 Gateway monitors gateway instances, certificates, license usage, and request status codes, and can notify you when these conditions cross a threshold. Alerting is built around three concepts you configure in the Dashboard:
- Contact Points — reusable destinations for notifications (Email or Webhook).
- Alert Policies — the conditions to evaluate (event, rule, gateway groups, check interval) and which contact points to notify when triggered.
- Alert History — a 30-day log of past alerts and the notifications that were delivered.
For background on how alerts are evaluated and delivered, see Alerts and Contact Points.
Configure a contact point
Before you create a policy, configure at least one contact point so the gateway has somewhere to send notifications.
- From the top navigation bar, select Organization, then Contact Points.
- Click Add Contact Point and choose either:
- Email — enter one or more recipient addresses. Email delivery requires an SMTP server, which you configure under Organization > Settings > SMTP Server.
- Webhook — enter the destination URL. Use this to forward notifications to Slack (via Slack Incoming Webhooks), PagerDuty, Microsoft Teams, or any HTTP endpoint that accepts an alert payload.
You can attach multiple contact points to a single alert policy, so a policy can notify both an on-call email list and a Slack channel.
Recommended alert policies
The Dashboard exposes more than 20 built-in event types. The following policies are recommended for most production deployments. Each is created from Alert > Policies > Add Alert Policy.
| Event | Why it matters | Suggested check interval |
|---|---|---|
| Gateway instance offline | A data plane node has stopped reporting heartbeats and is no longer serving traffic. | 60 minutes |
| mTLS certificate between control plane and data plane will expire | These certificates have a 13-month validity period; expiry breaks CP–DP communication. | 1440 minutes (daily) |
| SSL certificate will expire | Listener TLS certificates need to be rotated before clients see TLS errors. | 1440 minutes (daily) |
| Allowed License CPU Quota Exceeded | The combined CPU cores of all data plane instances has exceeded the quota allowed by your API7 Enterprise license. | 60 minutes |
| Number of healthy gateway instances | Detect when a gateway group falls below the minimum instance count required for your SLO. | 30 minutes |
| Number of status code 500 / Ratio of status code 500 | Surface upstream or gateway failures that affect a measurable share of traffic. | 30 minutes |
For the concepts behind these events and how the control plane evaluates them, see Alerts and Contact Points. For the variables you can use in notification templates, see Alert Variables and Templates.
Customize notification content
Email subjects, email bodies, and webhook payloads can include the following template variables:
| Variable | Description |
|---|---|
{{ .AlertPolicyName }} | Name of the alert policy. |
{{ .Description }} | Description of the alert policy. |
{{ .Severity }} | Severity (High, Medium, or Low). |
{{ .TriggerGatewayGroup }} | Comma-separated names of the gateway groups that triggered the alert. |
{{ .AlertTime }} | Time of the alert. Format with {{ .AlertTime.Format "2006 Jan 02 15:04:05" }}. |
{{ .AlertDetail }} | Human-readable description of the events that triggered the alert. Use the escape function ({{ .AlertDetail | escape }}) when embedding in a JSON webhook body. |
For the complete reference, see Alert Variables and Templates.
Review alert history
Alert events and the notifications delivered for them are retained for 30 days. To inspect them, select Alert from the side navigation bar, then click History. Each record includes the policy that fired, severity, trigger time, gateway groups involved, and the alert detail that was sent.
Next steps
- Monitor Metrics — refine alert thresholds against the metrics they evaluate.
- Distributed Tracing — investigate the root cause of a fired alert.
- Centralized Logging — correlate alert events with the corresponding access and error log entries.