ai-aliyun-content-moderation
The ai-aliyun-content-moderation
plugin supports the integration with Aliyun to check request bodies for risk level when proxying to LLMs, such as profanity, hate speech, insult, harassment, violence, and more, rejecting requests if the evaluated outcome exceeds the configured threshold.
Please ensure that the access_key_secret
is correctly configured in the plugin. If misconfigured, all requests will bypass the plugin to be directly forwarded to the LLM upstream, and you will see a Specified signature is not matched with our calculation
in the gateway's error log from the plugin.
The ai-aliyun-content-moderation
plugin should be used with either ai-proxy
or ai-proxy-multi
plugin for proxying LLM requests.
This plugin is currently only available in API7 Enterprise and will become available in APISIX in the 3.12.0 release.
Examples
The following examples will be using OpenAI as the upstream service provider.
Before proceeding, create an OpenAI account and obtain an API key. If you are working with other LLM providers, please refer to the provider's documentation to obtain an API key.
Additionally, create an Aliyun account and obtain the endpoint, region ID, access key ID, and access key secret.
You can optionally save these information to environment variables:
# replace with your data
export OPENAI_API_KEY=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
export ALIYUN_ENDPOINT=https://api7-docs.cn-shanghai.aliyuncs.com
export ALIYUN_REGION_ID=cn-shanghai
export ALIYUN_ACCESS_KEY_ID=LTAI5yXKZP77gR3BQQM9WJnA
export ALIYUN_ACCESS_KEY_SECRET=hT2YpkqLs9FIjh3dyznBw7RMux5OKv
Moderate Request Content Toxicity
The following example demonstrates how you can use the plugin to moderate content toxicity in requests and customize rejection code and message/
Create a route to the LLM chat completion endpoint using the ai-proxy
plugin and configure the integration details as well as the deny code and message in the ai-aliyun-content-moderation
plugin:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-aliyun-content-moderation-route",
"uri": "/anything",
"plugins": {
"ai-aliyun-content-moderation": {
"endpoint": "'"$ALIYUN_ENDPOINT"'",
"region_id": "'"$ALIYUN_REGION_ID"'",
"access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
"access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
"deny_code": 400,
"deny_message": "Request contains forbidden content, such as hate speech or violence."
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"model": "gpt-4"
}
}
}'
❶ Configure the rejection HTTP status code.
❷ Configure the rejection message.
Send a POST request to the route with a system prompt and a user question with a profane word in the request body:
curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'
You should receive an HTTP/1.1 400 Bad Request
response and see the following message:
{
"object": "chat.completion",
"usage": {
"total_tokens": 0,
"prompt_tokens": 0,
"completion_tokens": 0
},
"choices": [
{
"message": {
"role": "assistant",
"content": "Request contains forbidden content, such as hate speech or violence."
},
"finish_reason": "stop",
"index": 0
}
],
"model": "from-security-guard",
"id": "c9466bbf-e010-469d-949a-a10f25525964"
}
Send another request to the route with a typical question in the request body:
curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'
You should receive an HTTP/1.1 200 OK
response with the model output:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}
Adjust Risk Level Threshold
The following example demonstrates how you can use adjust the threshold of risk level, which regulates whether a request/response should be allowed through.
Create a route to the LLM chat completion endpoint using the ai-proxy
plugin and configure the risk_level_bar
in ai-aliyun-content-moderation
to be high
:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-aliyun-content-moderation-route",
"uri": "/anything",
"plugins": {
"ai-aliyun-content-moderation": {
"endpoint": "'"$ALIYUN_ENDPOINT"'",
"region_id": "'"$ALIYUN_REGION_ID"'",
"access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
"access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
"deny_code": 400,
"deny_message": "Request contains forbidden content, such as hate speech or violence.",
"risk_level_bar": "high"
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"model": "gpt-4"
}
}
}'
Send a POST request to the route with a system prompt and a user question with a profane word in the request body:
curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'
You should receive an HTTP/1.1 400 Bad Request
response and see the following message:
{
"object": "chat.completion",
"usage": {
"total_tokens": 0,
"prompt_tokens": 0,
"completion_tokens": 0
},
"choices": [
{
"message": {
"role": "assistant",
"content": "Request contains forbidden content, such as hate speech or violence."
},
"finish_reason": "stop",
"index": 0
}
],
"model": "from-security-guard",
"id": "c9466bbf-e010-469d-949a-a10f25525964"
}
Update the risk_level_bar
in the plugin to max
:
curl "http://127.0.0.1:9180/apisix/admin/routes/ai-aliyun-content-moderation-route" -X PATCH \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"plugins": {
"ai-aliyun-content-moderation": {
"risk_level_bar": "max"
}
}
}'
Send the same request to the route:
curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'
You should receive an HTTP/1.1 200 OK
response with the model output:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}
This is because the word "stupid" has a risk level of high
, which is lower than the configured threshold of max
. To see the Aliyun moderation outcome, you can update the gateway's log level to debug
as such:
nginx_config:
error_log_level: debug
Reload the gateway for configuration changes to take effect.
For example, for the request above, you should see a debug log entry similar to the following:
{
"RequestId": "29F7AD19-074B-54AC-B240-B297AD96883F",
"Message": "OK",
"Data": {
...,
"RiskLevel": "high",
"Result": [
{
"RiskWords": "are&you&stupid",
...
}
]
},
"Code": 200
}