Skip to main content

ai-proxy

The ai-proxy plugin simplifies access to LLM and embedding models by transforming plugin configurations into the designated request format. It supports the integration with OpenAI, DeepSeek, and other OpenAI-compatible APIs.

In addition, the plugin also supports logging LLM request information in the access log, such as token usage, model, time to the first response, and more.

Examples

The examples below demonstrate how you can configure ai-proxy for different scenarios.

Proxy to OpenAI

The following example demonstrates how you can configure the API key, model, and other parameters in the ai-proxy plugin and configure the plugin on a route to proxy user prompts to OpenAI.

Obtain the OpenAI API key and save it to an environement variable:

export OPENAI_API_KEY=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   # replace with your API key

Create a route and configure the ai-proxy plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
}
}
}
}'

❶ Specify the provider to be openai.

❷ Attach OpenAI API key in the Authorization header.

❸ Specify the name of the model.

Send a POST request to the route with a system prompt and a sample user question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H "Host: api.openai.com" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Proxy to DeepSeek

The following example demonstrates how you can configure the ai-proxy plugin to proxy requests to DeekSeek.

Obtain the DeekSeek API key and save it to an environement variable:

export DEEPSEEK_API_KEY=sk-5e99f3e26abc40e75d80009a90e66   # replace with your API key

Create a route and configure the ai-proxy plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'

❶ Specify the provider to be deepseek, so that the plugin will proxy requests to https://api.deepseek.com/chat/completions.

❷ Attach OpenAI API key in the Authorization header.

❸ Specify the name of the model.

Send a POST request to the route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
]
}'

You should receive a response similar to the following:

{
...
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Apache APISIX is a dynamic, real-time, high-performance API gateway and cloud-native platform. It provides rich traffic management features like load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more. Designed for microservices and serverless architectures, APISIX ensures scalability, security, and seamless integration with modern DevOps workflows."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Proxy to Azure OpenAI

The following example demonstrates how you can configure the ai-proxy plugin to proxy requests to other LLM services, such as Azure OpenAI.

Obtain the Azure OpenAI API key and save it to an environement variable:

export AZ_OPENAI_API_KEY=57cha9ee8e8a89a12c0aha174f180f4   # replace with your API key

Create a route and configure the ai-proxy plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai-compatible",
"auth": {
"header": {
"api-key": "'"$AZ_OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
},
"override": {
"endpoint": "https://api7-auzre-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
}
}
}
}'

❶ Specify the provider to be openai-compatible, so that the plugin will proxy requests to the custom endpoint in override.

❷ Attach OpenAI API key in the Authorization header.

❸ Specify the name of the model.

❹ Override with the Azure OpenAI endpoint.

Send a POST request to the route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
],
"max_tokens": 800,
"temperature": 0.7,
"frequency_penalty": 0,
"presence_penalty": 0,
"top_p": 0.95,
"stop": null
}'

You should receive a response similar to the following:

{
"choices": [
{
...,
"message": {
"content": "Apache APISIX is a modern, cloud-native API gateway built to handle high-performance and low-latency use cases. It offers a wide range of features, including load balancing, rate limiting, authentication, and dynamic routing, making it an ideal choice for microservices and cloud-native architectures.",
"role": "assistant"
}
}
],
...
}

Proxy to Embedding Models

The following example demonstrates how you can configure the ai-proxy plugin to proxy requests to embedding models. This example will use the OpenAI embedding model endpoint.

Obtain the OpenAI API key and save it to an environement variable:

export OPENAI_API_KEY=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   # replace with your API key

Create a route and configure the ai-proxy plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-route",
"uri": "/embeddings",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "text-embedding-3-small",
"encoding_format": "float"
},
"override": {
"endpoint": "https://api.openai.com/v1/embeddings"
}
}
}
}'

❶ Specify the provider to be openai, so that the plugin will proxy requests to https://api.openai.com/chat/completions.

❷ Attach OpenAI API key in the Authorization header.

❸ Specify the name of the embedding model.

❹ Add an additional parameter encoding_format to configure returned embedding vector to be a list of floating point numbers.

❺ Override the default endpoint with the embedding API endpoint.

Send a POST request to the route with an input string:

curl "http://127.0.0.1:9080/embeddings" -X POST \
-H "Content-Type: application/json" \
-d '{
"input": "hello world"
}'

You should receive a response similar to the following:

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0067144386,
-0.039197803,
0.034177095,
0.028763203,
-0.024785956,
-0.04201061,
...
],
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}

Proxy to Selected Model using Request Body Parameter

The following example demonstrates how you can proxy requests to different models on the same URI, based on the user-specified model in the user requests. You will being using the post_arg.* variable to fetch the value of the request body parameter.

note

The usage in this example currently only works in API7 Enterprise and will become available in APISIX 3.13.0.

The example will use OpenAI and DeepSeek as the example LLM services. Obtain the OpenAI and DeepSeek API keys and save them to environement variables:

# replace with your API key
export OPENAI_API_KEY=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
export DEEPSEEK_API_KEY=sk-5e99f3e26abc40e75d80009a90e66

Create a route to the OpenAI API with the ai-proxy plugin:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "openai" ]],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'

❶ Set the route URI to be /anything.

❷ Match the route to requests where the body parameter model is set to openai.

Create another route /anything to the DeekSeek API with the ai-proxy plugin:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-deepseek-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "deepseek" ]],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'

❶ Set the route URI to be /anything, same as the previous route.

❷ Match the route to requests where the body parameter model is set to deepseek.

Send a POST request to the route with model set to openai:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "openai",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Send a POST request to the route with model set to deepseek:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

You can also configure post_arg.* to fetch nested request body parameter. For instance, if the request format is:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": {
"name": "openai"
},
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You can configure the vars on the route to be [[ "post_arg.model.name", "==", "openai" ]].

For more information on the expressions, see APISIX Expressions.

Include LLM Information in Access Log

The following example demonstrates how you can log LLM request related information in the gateway's access log to improve analytics and audit. The following variables are available:

  • request_type: Type of request, where the value could be traditional_http, ai_chat, or ai_stream.
  • llm_time_to_first_token: Duration from request sending to the first token received from the LLM service, in milliseconds.
  • llm_model: LLM model name forwarded to the upstream LLM service.
  • request_llm_model: LLM model name specified in the request.
  • llm_prompt_tokens: Number of tokens in the prompt.
  • llm_completion_tokens: Number of chat completion tokens in the prompt.
note

The usage in this example currently only works in API7 Enterprise and will become available in APISIX 3.13.0.

Update the access log format in your configuration file to include additional LLM related variables:

conf/config.yaml
nginx_config:
http:
access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request_line\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\" \"$apisix_request_id\" \"$request_type\" \"$llm_time_to_first_token\" \"$llm_model\" \"$request_llm_model\" \"$llm_prompt_tokens\" \"$llm_completion_tokens\""

Reload the gateway for configuration changes to take effect.

Now if you create a route following the Proxy to OpenAI example. Send a request like this:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

Since the model in ai-proxy is gpt-4, the request will be forwarded to GPT-4 model and you will receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
...
},
"service_tier": "default",
"system_fingerprint": null
}

In the gateway's access log, you should see a log entry similar to the following:

192.168.215.1 - - [21/Mar/2025:04:28:03 +0000] api.openai.com "POST /anything HTTP/1.1" 200 804 2.858 "-" "curl/8.6.0" - - - "http://api.openai.com" "5c5e0b95f8d303cb81e4dc456a4b12d9" "ai_chat" "2858" "gpt-4" "gpt-3.5" "23" "8"

The access log entry shows the request type is ai_chat, time to first token is 2858 milliseconds, LLM model the request was forwarded to is gpt-4, request LLM model gpt-3.5, prompt token usage is 23, and completion token usage is 8.

Send Request Log to Logger

The following example demonstrates how you can log request and request information, including LLM model, token, and payload, and push them to a logger. Before proceeding, you should first set up a logger, such as Kafka. See kafka-logger for more information.

note

The usage in this example currently only works in API7 Enterprise and will become available in APISIX 3.13.0.

Create a route to your LLM service and configure logging details as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
},
"logging": {
"summaries": true,
"payloads": true
}
},
"kafka-logger": {
"brokers": [
{
"host": "127.0.0.1",
"port": 9092
}
],
"kafka_topic": "test2",
"key": "key1",
"batch_max_size": 1
}
}
}
}'

❶ Log request LLM model, duration, request and response tokens.

❷ Log request and response payload.

❸ Update with your Kafka address.

❹ Update with your Kafka topic.

❺ Update with your Kafka key.

❻ Set to 1 to send the log entry immediately.

Send a POST request to the route:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

In the Kafka topic, you should also see a log entry corresponding to the request with the LLM summary and request/response payload.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2025. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation