Skip to main content

Version: 3.9.x

Implement RAG at the Gateway Layer

This guide shows how to configure Retrieval-Augmented Generation (RAG) in API7 AI Gateway so requests are enriched with context from your knowledge base before reaching the LLM.

Overview

Current limitation: RAG in API7 AI Gateway is Azure-only today. You must use Azure OpenAI for embeddings and Azure AI Search for vector search. Support for additional providers is planned but not implemented.

RAG augments LLM prompts with relevant context retrieved at request time from your vector knowledge base. Implementing this at the gateway layer centralizes enrichment logic and avoids application-side RAG orchestration in every service.

Architecture flow:

  1. Client sends a chat request to API7 AI Gateway.
  2. Gateway uses ai-rag to generate embeddings and run vector search against Azure AI Search.
  3. Gateway injects retrieved context into the prompt.
  4. Gateway forwards the enriched request to Azure OpenAI through ai-proxy.
  5. Client receives a grounded answer.

Prerequisites

  • Install Docker.

  • Install cURL to send requests to the services for validation.

  • Have a running API7 Gateway instance with the ai-proxy and ai-rag plugins available.

  • Create a token from the Dashboard and save it to an environment variable:

    export API_KEY=your-dashboard-token   # replace with your Dashboard token
  • Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.

  • If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:

    export SERVICE_ID=your-service-id         # replace with your service ID
  • Azure OpenAI resource and deployment for your generation model.

  • Azure OpenAI access for embeddings.

  • Azure AI Search service with an index populated from your knowledge base.

Configure the RAG Plugin

Configure ai-rag and ai-proxy on the same route so retrieval and generation happen in one request path.

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-d '{
"id": "rag-azure",
"service_id": "'"$SERVICE_ID"'",
"paths": ["/v1/chat/completions"],
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "YOUR_AZURE_OPENAI_KEY"
}
},
"options": {
"model": "gpt-4o-mini"
},
"override": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT/chat/completions?api-version=2024-10-21"
}
},
"ai-rag": {
"embeddings_provider": {
"azure_openai": {
"endpoint": "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2023-05-15",
"api_key": "YOUR_AZURE_OPENAI_KEY"
}
},
"vector_search_provider": {
"azure_ai_search": {
"endpoint": "https://YOUR-SEARCH.search.windows.net/indexes/YOUR-INDEX/docs/search?api-version=2024-07-01",
"api_key": "YOUR_AZURE_SEARCH_KEY"
}
}
}
}
}'

ai-proxy handles generation and must target provider: "azure-openai" on this route.

ai-rag is configured with Azure-only backends: Azure OpenAI for embeddings (embeddings_provider) and Azure AI Search for vector retrieval (vector_search_provider).

❸ Specify the full Azure OpenAI endpoint, including your resource name, deployment name, and API version.

For the full configuration reference, see ai-rag and ai-proxy.

Validate the Configuration

Send a request that includes the ai_rag field. The request must provide vector_search.fields and embeddings.input.

curl "http://127.0.0.1:9080/v1/chat/completions" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Based on our internal docs, what are the main capabilities of API7 AI Gateway?"
}
],
"ai_rag": {
"vector_search": {
"fields": {
"content": "content",
"title": "title",
"url": "url"
},
"top_k": 3
},
"embeddings": {
"input": "API7 AI Gateway capabilities overview"
}
}
}'

ai_rag.vector_search.fields maps your Azure AI Search document fields used during retrieval.

ai_rag.embeddings.input is the text embedded for vector search and is required for retrieval.

❸ Use a question that depends on your indexed knowledge base so you can confirm grounding behavior.

You should receive a standard chat completion response with an answer grounded in your indexed documents. Compared with a request without ai_rag, the answer should be more specific and aligned with your internal knowledge base.

Best Practices

  • Keep your knowledge base and index up to date. Stale documents reduce answer quality.
  • Tune top_k for your data shape. Too low can miss context; too high can dilute prompts.
  • Monitor token usage closely. Added context increases prompt tokens and cost.

Next Steps

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation