Skip to main content

Version: 3.11.0

Implement Prompt Guardrails

Prompt guardrails provide additional safeguards to protect user privacy, prevent unintended or harmful model behaviors, discourage hallucinated responses, and stay compliant with responsible AI ethical standards when working with large language models (LLMs).

In this document, you will learn how to implement prompt guardrails to redact sensitive information, as well as to discourage undesired output and hallucinations.

Prerequisite(s)

Redact Sensitive Information (PII)

In this section, you will learn how to use the API7 Enterprise's data-mask plugin to implement an input guardrail that removes or replaces sensitive information in the request body before the request is forwarded to the LLM upstream services.

If you are integrating APISIX with OpenAI chat completions, the request body to the endpoint should follow the below format:

{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "system prompt goes here"
},
{
"role": "user",
"content": "user-defined prompt goes here"
},
...
]
}

Suppose you would like to mask all email addresses before the request is forwarded to the upstream service. Enable the data-mask plugin in the dashboard and use the following configuration in the JSON Editor:

{
"request": [
{
"action": "regex",
"type": "body",
"body_format": "json",
"name": "$.messages[*].content",
"regex": "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b",
"value": "xxxxxx@email.com"
}
]
}

❶ Configure a data masking action to match sensitive information using RegEx.

❷ Specify that the location where sensitive information is the request body.

❸ Specify the request body encoding.

❹ Specify the target fields in the request body to be content of all messages using the JSON path syntax.

❺ Configure a RegEx matching any email format.

❻ Replace the matched email addresses with xxxxxx@email.com.

Send a sample request to parse any PII present in text:

curl "http://127.0.0.1:9080/v1/chat/completions" -X POST \
-H "Content-Type: application/json" \
-H "Host: api.openai.com:443" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "Extract all PII in text below:"
},
{
"role": "user",
"content": "Steve Jobs co-found Apple in 1976 and reimagined the personal computer, making technology accessible for everyday users. You may write to Jobs at sjobs@apple.com."
}
]
}'

❶ Configure a system prompt to extract all PII (for easier verification).

❷ Send a user message including two pieces of PII, name and email address.

You should receive a response showing the email address has been redacted:

{
...,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "- Name: Steve Jobs\n- Email: xxxxx@apple.com"
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

For more information on how to use the data-mask plugin, see the plugin doc.

Discourage Hallucinations and Undesired Output

Hallucinations refer to instances where the model generates information that is factually incorrect, misleading, or entirely fabricated, even though it may sound plausible or confident. There are different approaches to mitigate hallucinations, one of which is to pre-engineer system prompts. For example, you can configure the following system prompt:

Before you respond to the user message, on a scale of 0 to 10, how confident are you with your response? If your confidence level is lower than 8/10, respond with "Sorry I do not have an answer that I am confident with" and explain the reasoning. If your confidence level is higher or equal to 8/10, you may return the response to the user.

You can also pre-engineer system prompts to discourage undesired output. For example, you may want all responses to not quote information from copyrighted content, or reference from any controversial sources. You can configure the following system prompt:

Provide all responses based on factual information, avoiding any quotes from copyrighted materials. Do not reference or include information from controversial or unreliable sources. Ensure that all content is original, non-derivative, and based on widely accepted, publicly available information.

See Configure Prompt Decorators to learn how you can configure these pre-engineered prompts, should you wish to adopt this approach.

Next Steps

You have now learned how to implement a few prompt guardrails in APISIX for additional safeguards when integrating with LLM service providers.

There are other types of guardrails, such as denied topics and content filter, as well as different ways to implement them. While some approaches may not be suitable or ideal to implement in the API gateway, you are encouraged to review additional resources and explore various strategies that best fit your business needs.


API7.ai Logo

API Management for Modern Architectures with Edge, API Gateway, Kubernetes, and Service Mesh.

Product

API7 Cloud

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN Ltd. 2019 – 2024. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the

Apache Software Foundation