Version: 3.10.x

Enforce AI Guardrails and Protect PII

This guide shows how to implement layered AI safety controls with API7 AI Gateway using ai-prompt-guard, ai-aws-content-moderation, and ai-request-rewrite.

Overview

Guardrails are most effective when enforced at the gateway layer, where policies are centralized and applied consistently across applications. A practical defense-in-depth model uses three layers:

Prompt filtering to block prompt injection and disallowed instructions before model invocation.
Content moderation to detect harmful content categories and reject high-risk requests.
PII redaction to mask sensitive data before requests are sent to LLM providers.

Prerequisites

Install Docker.
Install cURL to send requests to the services for validation.
Have a running API7 Gateway instance.

Create a token from the Dashboard and save it to an environment variable:

export API_KEY=your-dashboard-token   # replace with your Dashboard token

Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
```
export SERVICE_ID=your-service-id         # replace with your service ID
```

Prompt Protection

Use ai-prompt-guard to apply PCRE-based allow and deny patterns. In this example, the plugin checks user messages only (match_all_roles: false) and scans only the latest message (match_all_conversation_history: false). When a deny pattern matches, the request is rejected with HTTP 400.

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-guardrails-prompt-protection",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai/chat"],
    "plugins": {
      "ai-prompt-guard": {
        "allow_patterns": ["(?i)^(what|how|why|explain|summarize|translate)\\b"],
        "deny_patterns": ["(?i)(ignore\\s+all\\s+previous\\s+instructions|reveal\\s+system\\s+prompt|bypass\\s+guardrails)"],
        "match_all_roles": false,
        "match_all_conversation_history": false
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
        "options": { "model": "gpt-4o" }
      }
    }
  }'

❶ allow_patterns defines accepted prompt shape using PCRE syntax.

❷ deny_patterns blocks known injection and policy-bypass phrases.

❸ match_all_roles: false and match_all_conversation_history: false scope matching to the latest user message.

adc.yaml
services:
  - name: AI Prompt Protection
    routes:
      - uris:
          - /ai/chat
        name: ai-guardrails-prompt-protection
        plugins:
          ai-prompt-guard:
            allow_patterns:
              - (?i)^(what|how|why|explain|summarize|translate)\b
            deny_patterns:
              - (?i)(ignore\s+all\s+previous\s+instructions|reveal\s+system\s+prompt|bypass\s+guardrails)
            match_all_roles: false
            match_all_conversation_history: false
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o

❶ allow_patterns defines accepted prompt shape using PCRE syntax.

❷ deny_patterns blocks known injection and policy-bypass phrases.

❸ match_all_roles: false and match_all_conversation_history: false scope matching to the latest user message.

adc sync -f adc.yaml

For the full configuration reference, see ai-prompt-guard.

Content Moderation

Content moderation adds a second layer of filtering for harmful or abusive text.

AWS Comprehend Integration

Use ai-aws-content-moderation to score six moderation categories with 0-1 thresholds. You can set per-category thresholds in moderation_categories together with an overall toxicity threshold in moderation_threshold. Requests that exceed a category threshold or the overall threshold can be blocked with a configurable rejection status code and message.

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-guardrails-content-moderation",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai/chat"],
    "plugins": {
      "ai-aws-content-moderation": {
        "comprehend": {
          "access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
          "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
          "region": "us-east-1"
        },
        "moderation_categories": {
          "PROFANITY": 0.5,
          "HATE_SPEECH": 0.5,
          "INSULT": 0.5,
          "HARASSMENT_OR_ABUSE": 0.5,
          "SEXUAL": 0.5,
          "VIOLENCE_OR_THREAT": 0.5
        },
        "moderation_threshold": 0.5
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
        "options": { "model": "gpt-4o" }
      }
    }
  }'

❶ comprehend provides AWS credentials for the Comprehend API. access_key_id, secret_access_key, and region are required.

❷ Configure per-category thresholds for PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, and VIOLENCE_OR_THREAT.

❸ moderation_threshold defines the overall toxicity threshold. Use moderation_categories to enforce specific category thresholds in addition to the global threshold.

adc.yaml
services:
  - name: AI Content Moderation
    routes:
      - uris:
          - /ai/chat
        name: ai-guardrails-content-moderation
        plugins:
          ai-aws-content-moderation:
            comprehend:
              access_key_id: ${AWS_ACCESS_KEY_ID}
              secret_access_key: ${AWS_SECRET_ACCESS_KEY}
              region: us-east-1
            moderation_categories:
              PROFANITY: 0.5
              HATE_SPEECH: 0.5
              INSULT: 0.5
              HARASSMENT_OR_ABUSE: 0.5
              SEXUAL: 0.5
              VIOLENCE_OR_THREAT: 0.5
            moderation_threshold: 0.5
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o

❶ comprehend provides AWS credentials for the Comprehend API. access_key_id, secret_access_key, and region are required.

❷ Configure per-category thresholds for PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, and VIOLENCE_OR_THREAT.

❸ moderation_threshold defines the overall toxicity threshold. Use moderation_categories to enforce specific category thresholds in addition to the global threshold.

adc sync -f adc.yaml

For the full configuration reference, see ai-aws-content-moderation.

Custom Moderation Services

If you use a custom moderation stack, you can call a dedicated moderation model with ai-request-rewrite and reject or sanitize content before forwarding to the primary LLM route. This approach is useful when you need custom taxonomy, language coverage, or organization-specific policies.

PII Redaction

PII protection helps prevent accidental exposure of names, phone numbers, account identifiers, and other sensitive fields to external LLM providers. Gateway-side redaction also helps support compliance controls aligned with GDPR, HIPAA, and SOC 2.

Request-Side PII Masking

Use ai-request-rewrite to send incoming prompts to a separate model that detects and masks PII before the request reaches the primary LLM.

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-guardrails-pii-request",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai/chat"],
    "plugins": {
      "ai-request-rewrite": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4o"
        },
        "prompt": "Detect and redact PII in the incoming user text. Replace emails with [REDACTED_EMAIL], phone numbers with [REDACTED_PHONE], payment card numbers with [REDACTED_CARD], and government identifiers with [REDACTED_ID]. Return only sanitized text."
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
        "options": { "model": "gpt-4o" }
      }
    }
  }'

❶ provider sets the model backend used for rewrite decisions.

❷ auth configures credentials for the rewrite model call.

❸ options.model selects the rewrite model; prompt defines masking instructions.

adc.yaml
services:
  - name: AI Request-Side PII Masking
    routes:
      - uris:
          - /ai/chat
        name: ai-guardrails-pii-request
        plugins:
          ai-request-rewrite:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o
            prompt: >-
              Detect and redact PII in the incoming user text. Replace emails with
              [REDACTED_EMAIL], phone numbers with [REDACTED_PHONE], payment card
              numbers with [REDACTED_CARD], and government identifiers with
              [REDACTED_ID]. Return only sanitized text.
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o

❶ provider sets the model backend used for rewrite decisions.

❷ auth configures credentials for the rewrite model call.

❸ options.model selects the rewrite model; prompt defines masking instructions.

adc sync -f adc.yaml

Response-Side PII Filtering

The same ai-request-rewrite pattern can be applied to sanitize model output before it is returned to clients. Use a response-focused rewrite prompt to mask generated PII (for example, names, phone numbers, and IDs that appear in model responses).

{
  "ai-request-rewrite": {
    "provider": "openai",
    "auth": {
      "header": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    },
    "options": {
      "model": "gpt-4o"
    },
    "prompt": "Review generated text and mask any detected PII before returning it to clients."
  }
}

For the full configuration reference, see ai-request-rewrite.

Combining Guardrails

In most deployments, combine the three controls on the same route. A practical execution order is:

Start with ai-prompt-guard to reject obvious prompt injection attempts early.
Apply ai-request-rewrite to sanitize request content and remove PII.
Use ai-aws-content-moderation to score and block harmful content.
Finally, ai-proxy forwards approved traffic to the target LLM.

note

Plugin execution order is determined by plugin priority (phase and internal ordering), not by the order they appear in the configuration. The order listed above reflects the expected runtime behavior based on each plugin's assigned priority.

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-guardrails-combined",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai/chat"],
    "plugins": {
      "ai-prompt-guard": {
        "allow_patterns": ["(?i)^.{1,4000}$"],
        "deny_patterns": ["(?i)(ignore\\s+all\\s+previous\\s+instructions|reveal\\s+system\\s+prompt|developer\\s+mode)"],
        "match_all_roles": false,
        "match_all_conversation_history": false
      },
      "ai-request-rewrite": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4o"
        },
        "prompt": "Redact PII from user content before forwarding to the target model."
      },
      "ai-aws-content-moderation": {
        "comprehend": {
          "access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
          "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
          "region": "us-east-1"
        },
        "moderation_categories": {
          "PROFANITY": 0.5,
          "HATE_SPEECH": 0.5,
          "INSULT": 0.5,
          "HARASSMENT_OR_ABUSE": 0.5,
          "SEXUAL": 0.5,
          "VIOLENCE_OR_THREAT": 0.5
        },
        "moderation_threshold": 0.5
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
        "options": { "model": "gpt-4o" }
      }
    }
  }'

❶ ai-prompt-guard performs first-pass prompt filtering and rejects deny-pattern matches with HTTP 400.

❷ ai-request-rewrite sanitizes prompt content and masks PII before model invocation.

❸ ai-aws-content-moderation provides AWS Comprehend credentials in the comprehend block and enforces toxicity and abuse thresholds before forwarding traffic.

adc.yaml
services:
  - name: AI Combined Guardrails
    routes:
      - uris:
          - /ai/chat
        name: ai-guardrails-combined
        plugins:
          ai-prompt-guard:
            allow_patterns:
              - (?i)^.{1,4000}$
            deny_patterns:
              - (?i)(ignore\s+all\s+previous\s+instructions|reveal\s+system\s+prompt|developer\s+mode)
            match_all_roles: false
            match_all_conversation_history: false
          ai-request-rewrite:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o
            prompt: Redact PII from user content before forwarding to the target model.
          ai-aws-content-moderation:
            comprehend:
              access_key_id: ${AWS_ACCESS_KEY_ID}
              secret_access_key: ${AWS_SECRET_ACCESS_KEY}
              region: us-east-1
            moderation_categories:
              PROFANITY: 0.5
              HATE_SPEECH: 0.5
              INSULT: 0.5
              HARASSMENT_OR_ABUSE: 0.5
              SEXUAL: 0.5
              VIOLENCE_OR_THREAT: 0.5
            moderation_threshold: 0.5
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o

❶ ai-prompt-guard performs first-pass prompt filtering and rejects deny-pattern matches with HTTP 400.

❷ ai-request-rewrite sanitizes prompt content and masks PII before model invocation.

❸ ai-aws-content-moderation provides AWS Comprehend credentials in the comprehend block and enforces toxicity and abuse thresholds before forwarding traffic.

adc sync -f adc.yaml

Next Steps

Prompt Engineering and Templating — Standardize prompts before safety checks.
Token-Based Rate Limiting and Quota Management — Add budget controls to protected routes.
AI Observability and Cost Tracking — Monitor moderation outcomes, token usage, and latency.

Overview​

Prerequisites​

Prompt Protection​

Content Moderation​

AWS Comprehend Integration​

Custom Moderation Services​

PII Redaction​

Request-Side PII Masking​

Response-Side PII Filtering​

Combining Guardrails​

Next Steps​