Skip to main content

Version: latest

Implement RAG at the Gateway Layer

This guide shows how to configure Retrieval-Augmented Generation (RAG) in API7 AI Gateway so requests are enriched with context from your knowledge base before reaching the LLM.

Overview

Current limitation: RAG in API7 AI Gateway is Azure-only today. You must use Azure OpenAI for embeddings and Azure AI Search for vector search. Support for additional providers is planned but not implemented.

RAG augments LLM prompts with relevant context retrieved at request time from your vector knowledge base. Implementing this at the gateway layer centralizes enrichment logic and avoids application-side RAG orchestration in every service.

Architecture flow:

  1. Client sends a chat request to API7 AI Gateway.
  2. Gateway uses ai-rag to generate embeddings and run vector search against Azure AI Search.
  3. Gateway injects retrieved context into the prompt.
  4. Gateway forwards the enriched request to Azure OpenAI through ai-proxy.
  5. Client receives a grounded answer.

Prerequisites

  • API7 Gateway with ai-proxy and ai-rag plugins available.
  • Azure OpenAI resource and deployment for your generation model.
  • Azure OpenAI access for embeddings.
  • Azure AI Search service with an index populated from your knowledge base.

Configure the RAG Plugin

Configure ai-rag and ai-proxy on the same route so retrieval and generation happen in one request path.

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "rag-azure",
"service_id": "$SERVICE_ID",
"paths": ["/v1/chat/completions"],
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "YOUR_AZURE_OPENAI_KEY"
}
},
"options": {
"model": "gpt-4o-mini"
},
"override": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2024-10-21"
}
},
"ai-rag": {
"embeddings_provider": {
"azure_openai": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2023-05-15",
"api_key": "YOUR_AZURE_OPENAI_KEY"
}
},
"vector_search_provider": {
"azure_ai_search": {
"endpoint": "https://YOUR-SEARCH.search.windows.net/indexes/YOUR-INDEX/docs/search?api-version=2024-07-01",
"api_key": "YOUR_AZURE_SEARCH_KEY"
}
}
}
}
}'

ai-proxy handles generation and must target provider: "azure-openai" on this route.

ai-rag is configured with Azure-only backends: Azure OpenAI for embeddings (embeddings_provider) and Azure AI Search for vector retrieval (vector_search_provider).

❸ Specify the full Azure OpenAI endpoint, including your resource name, deployment name, and API version.

For the full configuration reference, see ai-rag and ai-proxy.

Validate the Configuration

Send a request that includes the ai_rag field. The request must provide vector_search.fields and embeddings.input.

curl "http://127.0.0.1:9080/v1/chat/completions" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Based on our internal docs, what are the main capabilities of API7 AI Gateway?"
}
],
"ai_rag": {
"vector_search": {
"fields": {
"content": "content",
"title": "title",
"url": "url"
},
"top_k": 3
},
"embeddings": {
"input": "API7 AI Gateway capabilities overview"
}
}
}'

ai_rag.vector_search.fields maps your Azure AI Search document fields used during retrieval.

ai_rag.embeddings.input is the text embedded for vector search and is required for retrieval.

❸ Use a question that depends on your indexed knowledge base so you can confirm grounding behavior.

You should receive a standard chat completion response with an answer grounded in your indexed documents. Compared with a request without ai_rag, the answer should be more specific and aligned with your internal knowledge base.

Best Practices

  • Keep your knowledge base and index up to date. Stale documents reduce answer quality.
  • Tune top_k for your data shape. Too low can miss context; too high can dilute prompts.
  • Monitor token usage closely. Added context increases prompt tokens and cost.

Next Steps

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation