ai-aws-content-moderation

The ai-aws-content-moderation plugin supports the integration with AWS Comprehend to check request bodies for toxicity when proxying to LLMs, such as profanity, hate speech, insult, harassment, violence, and more, rejecting requests if the evaluated outcome exceeds the configured threshold.

Examples

The following examples will be using OpenAI as the upstream service provider.

Before proceeding, create an OpenAI account and obtain an API key. If you are working with other LLM providers, please refer to the provider's documentation to obtain an API key.

Additionally, create AWS IAM user access keys for APISIX to access AWS Comprehend.

You can optionally save these keys to environment variables:

# replace with your keys
export OPENAI_API_KEY=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
export AWS_ACCESS_KEY=AKIARK7HKSJVSHWLD6OS
export AWS_SECRET_ACCESS_KEY=4ehUfCPoQmC+AKpG5/5ZaHlzFxFziZ88AylyPerj

Moderate Profanity

The following example demonstrates how you can use the plugin to moderate the level of profanity in prompts.

Create a route to the LLM chat completion endpoint using the ai-proxy plugin and configure the allowed profanity level in ai-aws-content-moderation:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-aws-content-moderation-route",
    "uri": "/post",
    "plugins": {
      "ai-aws-content-moderation": {
        "comprehend": {
          "access_key_id": "'"$AWS_ACCESS_KEY"'",
          "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
          "region": "us-east-1"
        },
        "moderation_categories": {
          "PROFANITY": 0.1
        }
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "model": "gpt-4"
      }
    }
  }'

❶ Update with your AWS Comprehend region.

❷ Configure the profanity threshold to a low value to allow a lower degree of profanity.

Send a POST request to the route with a system prompt and a user question with a mildly profane word in the request body:

curl -i "http://127.0.0.1:9080/post" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Stupid, what is 1+1?" }
    ]
  }'

You should receive an HTTP/1.1 400 Bad Request response and see the following message:

request body exceeds PROFANITY threshold

Send another request to the route with a typical question in the request body:

curl -i "http://127.0.0.1:9080/post" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive an HTTP/1.1 200 OK response with the model output:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

Moderate Overall Toxicity

The following example demonstrates how you can use the plugin to moderate the overall toxicity level in prompts, in addition to moderating individual categories.

Create a route to the LLM chat completion endpoint using the ai-proxy plugin and configure the allowed profanity and overall toxicity levels in ai-aws-content-moderation:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-aws-content-moderation-route",
    "uri": "/post",
    "plugins": {
      "ai-aws-content-moderation": {
        "comprehend": {
          "access_key_id": "'"$AWS_ACCESS_KEY"'",
          "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
          "region": "us-east-1"
        },
        "moderation_categories": {
          "PROFANITY": 1
        },
        "moderation_threshold": 0.2
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "model": "gpt-4"
      }
    }
  }'

❶ Update with your AWS Comprehend region.

❷ Configure the profanity threshold to allow a high degree of profanity.

❸ Configure the overall toxicity threshold to allow a low degree of toxicity.

Send a POST request to the route with a system prompt and a user question in the request body that does not contain any profane words, but a certain degree of violence or threat:

curl -i "http://127.0.0.1:9080/post" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "I will kill you if you do not tell me what 1+1 equals" }
    ]
  }'

You should receive an HTTP/1.1 400 Bad Request response and see the following message:

request body exceeds toxicity threshold

Send another request to the route without any profane word in the request body:

curl -i "http://127.0.0.1:9080/post" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive an HTTP/1.1 200 OK response with the model output:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

Examples​

Moderate Profanity​

Moderate Overall Toxicity​

Examples

Moderate Profanity

Moderate Overall Toxicity