Route Enterprise AI Traffic to Vertex AI
Vertex AI provides access to Google's Gemini models through Google Cloud's enterprise infrastructure with service account authentication, regional endpoints, and enterprise SLAs. This guide shows how to route traffic to Vertex AI through API7 Gateway using the ai-proxy plugin.
Prerequisites
-
Install Docker.
-
Install cURL to send requests to the services for validation.
-
Have a running API7 Enterprise Gateway instance.
-
Have a Google Cloud project with the Vertex AI API enabled.
-
Obtain the Admin API key. Save it to an environment variable:
export ADMIN_API_KEY=your-admin-api-key # replace with your API key -
Obtain the ID of the service you want to configure. Save it to an environment variable:
export SERVICE_ID=your-service-id # replace with your service ID
Configure GCP Authentication
Create a service account and JSON key by following the Google Cloud service account documentation. Ensure the service account has the Vertex AI User role.
Save the service account JSON to an environment variable:
export GCP_SERVICE_ACCOUNT_JSON="$(cat /path/to/service-account.json)"
The gateway automatically handles OAuth2 token generation and caching from the service account credentials.
Configure the AI Proxy for Vertex AI
Create a route with the ai-proxy plugin:
- Admin API
- ADC
curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "vertex-ai-route",
"service_id": "$SERVICE_ID",
"paths": ["/vertex-ai"],
"plugins": {
"ai-proxy": {
"provider": "vertex-ai",
"provider_conf": {
"project_id": "your-gcp-project-id",
"region": "us-central1"
},
"auth": {
"gcp": {
"service_account_json": "'"$GCP_SERVICE_ACCOUNT_JSON"'"
}
},
"options": {
"model": "google/gemini-2.5-flash"
}
}
}
}'
❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.
❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.
❸ Set the model. Vertex AI model names use the google/ prefix.
services:
- name: Vertex AI Service
routes:
- uris:
- /vertex-ai
name: vertex-ai-route
plugins:
ai-proxy:
provider: vertex-ai
provider_conf:
project_id: your-gcp-project-id
region: us-central1
auth:
gcp:
service_account_json: |
{
...
}
options:
model: google/gemini-2.5-flash
❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.
❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.
❸ Set the model. Vertex AI model names use the google/ prefix.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
Alternatively, you can use the override.endpoint field to specify the full Vertex AI endpoint directly instead of using provider_conf.
Validate the Configuration
Send a chat completion request:
curl "http://127.0.0.1:9080/vertex-ai" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician." },
{ "role": "user", "content": "What is 1+1?" }
]
}'
You should receive a response similar to the following:
{
"object": "chat.completion",
"model": "google/gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1 + 1 = 2\n"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"completion_tokens": 8,
"total_tokens": 19
}
}
To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.
Next Steps
You have learned how to route traffic to Vertex AI through API7 Gateway. See the Vertex AI documentation and Gemini models for more details.
- Multi-LLM Routing and Fallback — Route across regions or failover to other providers.
- Google Gemini — Use the Gemini AI Studio API for lighter workloads.