Pre-Define Prompt Templates
When working with large language models (LLMs), administrators may prefer to pre-configure a prompt template that accepts user inputs in designated fields, which allows the service to be re-used across the organization.
In this document, you will learn how to configure a prompt template in APISIX using the ai-prompt-template
plugin so that users can easily interact with the model by inserting values into the designated fields in a "fill in the blank" fashion. While the document will be using OpenAI as the sample upstream service, the procedure can be easily adapted to work with other LLM service providers.
Prerequisite(s)
- Install Docker.
- Install cURL to send requests to the services for validation.
- Follow the Getting Started Tutorial to start a new APISIX instance in Docker.
Obtain an OpenAI API Key
Create an OpenAI account and an API key before proceeding. You can optionally save the key to an environment variable as such:
export OPENAI_API_KEY=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 # replace with your API key
Create a Route
Create a route to the OpenAI API endpoint with a sample prompt template that accepts a user-defined prompt and answers in the specified complexity:
- Admin API
- Ingress Controller
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-prompt-template-route",
"uri": "/anything",
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
},
"ai-prompt-template": {
"templates": [
{
"name": "QnA with complexity",
"template": {
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "Answer in {{complexity}}."
},
{
"role": "user",
"content": "Explain {{prompt}}."
}
]
}
}
]
}
}
}'
❶ Name the template set. When requesting the route, the request should include the template name.
❷ Specify the model identifier.
❸ Configure a prompt that obtains the user-defined answer complexity from the request body key complexity
.
❹ Configure a prompt that obtains the user-defined question from the request body key prompt
.
- Gateway API
- APISIX CRD
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: ingress-apisix
name: ai-prompt-decor-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer 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"
options:
model: gpt-4
- name: ai-prompt-template
config:
templates:
- name: QnA with complexity
template:
model: gpt-4
messages:
- role: system
content: "Answer in {{complexity}}."
- role: user
content: "Explain {{prompt}}."
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: ingress-apisix
name: ai-prompt-decorator-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-prompt-decor-plugin-config
apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: ingress-apisix
name: ai-prompt-decorator-route
spec:
ingressClassName: apisix
http:
- name: ai-prompt-decorator-route
match:
paths:
- /anything
plugins:
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer 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"
options:
model: gpt-4
- name: ai-prompt-template
enable: true
config:
templates:
- name: QnA with complexity
template:
model: gpt-4
messages:
- role: system
content: "Answer in {{complexity}}."
- role: user
content: "Explain {{prompt}}."
❶ Name the template set. When requesting the route, the request should include the template name.
❷ Specify the model identifier.
❸ Configure a prompt that obtains the user-defined answer complexity from the request body key complexity
.
❹ Configure a prompt that obtains the user-defined question from the request body key prompt
.
Apply the configuration to your cluster:
kubectl apply -f ai-prompt-template-route.yaml
Verify
The route should now be available to be re-used to respond to a variety of questions with different levels of user-specified desired complexities.
Send a POST request to the route with a sample question and desired answer complexity in the request body:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"template_name": "QnA with complexity",
"complexity": "brief",
"prompt": "quick sort"
}'
You should receive a response similar to the following:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Quick sort is a highly efficient sorting algorithm that uses a divide-and-conquer approach to arrange elements in a list or array in order. Here’s a brief explanation:\n\n1. **Choose a Pivot**: Select an element from the list as a 'pivot'. Common methods include choosing the first element, the last element, the middle element, or a random element.\n\n2. **Partitioning**: Rearrange the elements in the list such that all elements less than the pivot are moved before it, and all elements greater than the pivot are moved after it. The pivot is now in its final position.\n\n3. **Recursively Apply**: Recursively apply the same process to the sub-lists of elements to the left and right of the pivot.\n\nThe base case of the recursion is lists of size zero or one, which are already sorted.\n\nQuick sort has an average-case time complexity of O(n log n), making it suitable for large datasets. However, its worst-case time complexity is O(n^2), which occurs when the smallest or largest element is always chosen as the pivot. This can be mitigated by using good pivot selection strategies or randomization.",
"role": "assistant"
}
}
],
"created": 1723194057,
"id": "chatcmpl-9uFmTYN4tfwaXZjyOQwcp0t5law4x",
"model": "gpt-4o-2024-05-13",
"object": "chat.completion",
"system_fingerprint": "fp_abc28019ad",
"usage": {
"completion_tokens": 234,
"prompt_tokens": 18,
"total_tokens": 252
}
}
Next Steps
You have now learned how to pre-define prompt templates in APISIX when integrating with LLM service providers, such that the same route can be re-used to take different user inputs and serve a variety of purposes.
If you would like to integrate with OpenAI's streaming API, you can use the proxy-buffering
plugin to disable NGINX's proxy_buffering
directive to avoid server-sent events (SSE) being buffered.
In addition, you can integrate more capabilities that APISIX offers, such as rate limiting and caching, to improve system availability and user experience.