Implement RAG at the Gateway Layer
This guide shows how to configure Retrieval-Augmented Generation (RAG) in API7 AI Gateway so requests are enriched with context from your knowledge base before reaching the LLM.
Overview
Current limitation: RAG in API7 AI Gateway is Azure-only today. You must use Azure OpenAI for embeddings and Azure AI Search for vector search. Support for additional providers is planned but not implemented.
RAG augments LLM prompts with relevant context retrieved at request time from your vector knowledge base. Implementing this at the gateway layer centralizes enrichment logic and avoids application-side RAG orchestration in every service.
Architecture flow:
- Client sends a chat request to API7 AI Gateway.
- Gateway uses
ai-ragto generate embeddings and run vector search against Azure AI Search. - Gateway injects retrieved context into the prompt.
- Gateway forwards the enriched request to Azure OpenAI through
ai-proxy. - Client receives a grounded answer.
Prerequisites
- API7 Gateway with
ai-proxyandai-ragplugins available. - Azure OpenAI resource and deployment for your generation model.
- Azure OpenAI access for embeddings.
- Azure AI Search service with an index populated from your knowledge base.
Configure the RAG Plugin
Configure ai-rag and ai-proxy on the same route so retrieval and generation happen in one request path.
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "rag-azure",
"service_id": "$SERVICE_ID",
"paths": ["/v1/chat/completions"],
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "YOUR_AZURE_OPENAI_KEY"
}
},
"options": {
"model": "gpt-4o-mini"
},
"override": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2024-10-21"
}
},
"ai-rag": {
"embeddings_provider": {
"azure_openai": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2023-05-15",
"api_key": "YOUR_AZURE_OPENAI_KEY"
}
},
"vector_search_provider": {
"azure_ai_search": {
"endpoint": "https://YOUR-SEARCH.search.windows.net/indexes/YOUR-INDEX/docs/search?api-version=2024-07-01",
"api_key": "YOUR_AZURE_SEARCH_KEY"
}
}
}
}
}'
❶ ai-proxy handles generation and must target provider: "azure-openai" on this route.
❷ ai-rag is configured with Azure-only backends: Azure OpenAI for embeddings (embeddings_provider) and Azure AI Search for vector retrieval (vector_search_provider).
❸ Specify the full Azure OpenAI endpoint, including your resource name, deployment name, and API version.
services:
- name: RAG Azure Service
routes:
- uris:
- /v1/chat/completions
name: rag-azure
plugins:
ai-proxy:
provider: azure-openai
auth:
header:
api-key: "YOUR_AZURE_OPENAI_KEY"
options:
model: gpt-4o-mini
override:
endpoint: https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2024-10-21
ai-rag:
embeddings_provider:
azure_openai:
endpoint: https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2023-05-15
api_key: YOUR_AZURE_OPENAI_KEY
vector_search_provider:
azure_ai_search:
endpoint: https://YOUR-SEARCH.search.windows.net/indexes/YOUR-INDEX/docs/search?api-version=2024-07-01
api_key: YOUR_AZURE_SEARCH_KEY
❶ ai-proxy handles generation and must target provider: "azure-openai" on this route.
❷ ai-rag is configured with Azure-only backends: Azure OpenAI for embeddings (embeddings_provider) and Azure AI Search for vector retrieval (vector_search_provider).
❸ Specify the full Azure OpenAI endpoint, including your resource name, deployment name, and API version.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
For the full configuration reference, see ai-rag and ai-proxy.
Validate the Configuration
Send a request that includes the ai_rag field. The request must provide vector_search.fields and embeddings.input.
curl "http://127.0.0.1:9080/v1/chat/completions" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Based on our internal docs, what are the main capabilities of API7 AI Gateway?"
}
],
"ai_rag": {
"vector_search": {
"fields": {
"content": "content",
"title": "title",
"url": "url"
},
"top_k": 3
},
"embeddings": {
"input": "API7 AI Gateway capabilities overview"
}
}
}'
❶ ai_rag.vector_search.fields maps your Azure AI Search document fields used during retrieval.
❷ ai_rag.embeddings.input is the text embedded for vector search and is required for retrieval.
❸ Use a question that depends on your indexed knowledge base so you can confirm grounding behavior.
You should receive a standard chat completion response with an answer grounded in your indexed documents. Compared with a request without ai_rag, the answer should be more specific and aligned with your internal knowledge base.
Best Practices
- Keep your knowledge base and index up to date. Stale documents reduce answer quality.
- Tune
top_kfor your data shape. Too low can miss context; too high can dilute prompts. - Monitor token usage closely. Added context increases prompt tokens and cost.
Next Steps
- Track token usage and costs to monitor RAG overhead.
- Apply token-based budgets to control spend.
- Review plugin references:
ai-rag,ai-proxy.