Enforce AI Guardrails and Protect PII
This guide shows how to implement layered AI safety controls with API7 AI Gateway using ai-prompt-guard, ai-aws-content-moderation, and ai-request-rewrite.
Overview
Guardrails are most effective when enforced at the gateway layer, where policies are centralized and applied consistently across applications. A practical defense-in-depth model uses three layers:
- Prompt filtering to block prompt injection and disallowed instructions before model invocation.
- Content moderation to detect harmful content categories and reject high-risk requests.
- PII redaction to mask sensitive data before requests are sent to LLM providers.
Prerequisites
- Install Docker.
- Install cURL to send requests to the services for validation.
- Have a running API7 Enterprise Gateway instance. See the Getting Started Guide for setup instructions.
Prompt Protection
Use ai-prompt-guard to apply PCRE-based allow and deny patterns. In this example, the plugin checks user messages only (match_all_roles: false) and scans only the latest message (match_all_conversation_history: false). When a deny pattern matches, the request is rejected with HTTP 400.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-guardrails-prompt-protection",
"service_id": "$SERVICE_ID",
"paths": ["/ai/chat"],
"plugins": {
"ai-prompt-guard": {
"allow_patterns": ["(?i)^(what|how|why|explain|summarize|translate)\\b"],
"deny_patterns": ["(?i)(ignore\\s+all\\s+previous\\s+instructions|reveal\\s+system\\s+prompt|bypass\\s+guardrails)"],
"match_all_roles": false,
"match_all_conversation_history": false
},
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o" }
}
}
}'
❶ allow_patterns defines accepted prompt shape using PCRE syntax.
❷ deny_patterns blocks known injection and policy-bypass phrases.
❸ match_all_roles: false and match_all_conversation_history: false scope matching to the latest user message.
services:
- name: AI Prompt Protection
routes:
- uris:
- /ai/chat
name: ai-guardrails-prompt-protection
plugins:
ai-prompt-guard:
allow_patterns:
- (?i)^(what|how|why|explain|summarize|translate)\b
deny_patterns:
- (?i)(ignore\s+all\s+previous\s+instructions|reveal\s+system\s+prompt|bypass\s+guardrails)
match_all_roles: false
match_all_conversation_history: false
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
❶ allow_patterns defines accepted prompt shape using PCRE syntax.
❷ deny_patterns blocks known injection and policy-bypass phrases.
❸ match_all_roles: false and match_all_conversation_history: false scope matching to the latest user message.
adc sync -f adc.yaml
For the full configuration reference, see ai-prompt-guard.
Content Moderation
Content moderation adds a second layer of filtering for harmful or abusive text.
AWS Comprehend Integration
Use ai-aws-content-moderation to score six moderation categories with 0-1 thresholds. Requests above configured thresholds can be blocked with a configurable rejection status code and message.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-guardrails-content-moderation",
"service_id": "$SERVICE_ID",
"paths": ["/ai/chat"],
"plugins": {
"ai-aws-content-moderation": {
"comprehend": {
"access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
"region": "us-east-1"
},
"moderation_categories": {
"PROFANITY": 0.5,
"HATE_SPEECH": 0.5,
"INSULT": 0.5,
"HARASSMENT_OR_ABUSE": 0.5,
"SEXUAL": 0.5,
"VIOLENCE_OR_THREAT": 0.5
},
"moderation_threshold": 0.5
},
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o" }
}
}
}'
❶ comprehend provides AWS credentials for the Comprehend API. access_key_id, secret_access_key, and region are required.
❷ Configure per-category thresholds for PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, and VIOLENCE_OR_THREAT.
❸ moderation_threshold defines the global block threshold applied to moderation scores.
services:
- name: AI Content Moderation
routes:
- uris:
- /ai/chat
name: ai-guardrails-content-moderation
plugins:
ai-aws-content-moderation:
comprehend:
access_key_id: your-aws-access-key-id
secret_access_key: your-aws-secret-access-key
region: us-east-1
moderation_categories:
PROFANITY: 0.5
HATE_SPEECH: 0.5
INSULT: 0.5
HARASSMENT_OR_ABUSE: 0.5
SEXUAL: 0.5
VIOLENCE_OR_THREAT: 0.5
moderation_threshold: 0.5
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
❶ comprehend provides AWS credentials for the Comprehend API. access_key_id, secret_access_key, and region are required.
❷ Configure per-category thresholds for PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, and VIOLENCE_OR_THREAT.
❸ moderation_threshold defines the global block threshold applied to moderation scores.
adc sync -f adc.yaml
For the full configuration reference, see ai-aws-content-moderation.
Custom Moderation Services
If you use a custom moderation stack, you can call a dedicated moderation model with ai-request-rewrite and reject or sanitize content before forwarding to the primary LLM route. This approach is useful when you need custom taxonomy, language coverage, or organization-specific policies.
PII Redaction
PII protection helps prevent accidental exposure of names, phone numbers, account identifiers, and other sensitive fields to external LLM providers. Gateway-side redaction also helps support compliance controls aligned with GDPR, HIPAA, and SOC 2.
Request-Side PII Masking
Use ai-request-rewrite to send incoming prompts to a separate model that detects and masks PII before the request reaches the primary LLM.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-guardrails-pii-request",
"service_id": "$SERVICE_ID",
"paths": ["/ai/chat"],
"plugins": {
"ai-request-rewrite": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4o"
},
"prompt": "Detect and redact PII in the incoming user text. Replace emails with [REDACTED_EMAIL], phone numbers with [REDACTED_PHONE], payment card numbers with [REDACTED_CARD], and government identifiers with [REDACTED_ID]. Return only sanitized text."
},
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o" }
}
}
}'
❶ provider sets the model backend used for rewrite decisions.
❷ auth configures credentials for the rewrite model call.
❸ options.model selects the rewrite model; prompt defines masking instructions.
services:
- name: AI Request-Side PII Masking
routes:
- uris:
- /ai/chat
name: ai-guardrails-pii-request
plugins:
ai-request-rewrite:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
prompt: >-
Detect and redact PII in the incoming user text. Replace emails with
[REDACTED_EMAIL], phone numbers with [REDACTED_PHONE], payment card
numbers with [REDACTED_CARD], and government identifiers with
[REDACTED_ID]. Return only sanitized text.
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
❶ provider sets the model backend used for rewrite decisions.
❷ auth configures credentials for the rewrite model call.
❸ options.model selects the rewrite model; prompt defines masking instructions.
adc sync -f adc.yaml
Response-Side PII Filtering
The same ai-request-rewrite pattern can be applied to sanitize model output before it is returned to clients. Use a response-focused rewrite prompt to mask generated PII (for example, names, phone numbers, and IDs that appear in model responses).
{
"ai-request-rewrite": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer YOUR_API_KEY"
}
},
"options": {
"model": "gpt-4o"
},
"prompt": "Review generated text and mask any detected PII before returning it to clients."
}
}
For the full configuration reference, see ai-request-rewrite.
Combining Guardrails
In most deployments, combine the three controls on the same route. A practical execution order is:
- Start with
ai-prompt-guardto reject obvious prompt injection attempts early. - Apply
ai-request-rewriteto sanitize request content and remove PII. - Use
ai-aws-content-moderationto score and block harmful content. - Finally,
ai-proxyforwards approved traffic to the target LLM.
Plugin execution order is determined by plugin priority (phase and internal ordering), not by the order they appear in the configuration. The order listed above reflects the expected runtime behavior based on each plugin's assigned priority.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-guardrails-combined",
"service_id": "$SERVICE_ID",
"paths": ["/ai/chat"],
"plugins": {
"ai-prompt-guard": {
"allow_patterns": ["(?i)^.{1,4000}$"],
"deny_patterns": ["(?i)(ignore\\s+all\\s+previous\\s+instructions|reveal\\s+system\\s+prompt|developer\\s+mode)"],
"match_all_roles": false,
"match_all_conversation_history": false
},
"ai-request-rewrite": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4o"
},
"prompt": "Redact PII from user content before forwarding to the target model."
},
"ai-aws-content-moderation": {
"comprehend": {
"access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
"region": "us-east-1"
},
"moderation_categories": {
"PROFANITY": 0.5,
"HATE_SPEECH": 0.5,
"INSULT": 0.5,
"HARASSMENT_OR_ABUSE": 0.5,
"SEXUAL": 0.5,
"VIOLENCE_OR_THREAT": 0.5
},
"moderation_threshold": 0.5
},
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o" }
}
}
}'
❶ ai-prompt-guard performs first-pass prompt filtering and rejects deny-pattern matches with HTTP 400.
❷ ai-request-rewrite sanitizes prompt content and masks PII before model invocation.
❸ ai-aws-content-moderation provides AWS Comprehend credentials in the comprehend block and enforces toxicity and abuse thresholds before forwarding traffic.
services:
- name: AI Combined Guardrails
routes:
- uris:
- /ai/chat
name: ai-guardrails-combined
plugins:
ai-prompt-guard:
allow_patterns:
- (?i)^.{1,4000}$
deny_patterns:
- (?i)(ignore\s+all\s+previous\s+instructions|reveal\s+system\s+prompt|developer\s+mode)
match_all_roles: false
match_all_conversation_history: false
ai-request-rewrite:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
prompt: Redact PII from user content before forwarding to the target model.
ai-aws-content-moderation:
comprehend:
access_key_id: your-aws-access-key-id
secret_access_key: your-aws-secret-access-key
region: us-east-1
moderation_categories:
PROFANITY: 0.5
HATE_SPEECH: 0.5
INSULT: 0.5
HARASSMENT_OR_ABUSE: 0.5
SEXUAL: 0.5
VIOLENCE_OR_THREAT: 0.5
moderation_threshold: 0.5
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer sk-proj-xxxxxxxxxxxxxxxxxxxxxxxx"
options:
model: gpt-4o
❶ ai-prompt-guard performs first-pass prompt filtering and rejects deny-pattern matches with HTTP 400.
❷ ai-request-rewrite sanitizes prompt content and masks PII before model invocation.
❸ ai-aws-content-moderation provides AWS Comprehend credentials in the comprehend block and enforces toxicity and abuse thresholds before forwarding traffic.
adc sync -f adc.yaml
Next Steps
- Prompt Engineering and Templating — Standardize prompts before safety checks.
- Token-Based Rate Limiting and Quota Management — Add budget controls to protected routes.
- AI Observability and Cost Tracking — Monitor moderation outcomes, token usage, and latency.