Route Enterprise AI Traffic to Vertex AI
Vertex AI provides access to Google's Gemini models through Google Cloud's enterprise infrastructure with service account authentication, regional endpoints, and enterprise SLAs. This guide shows how to route traffic to Vertex AI through API7 Gateway using the ai-proxy plugin.
Prerequisites
-
Install Docker.
-
Install cURL to send requests to the services for validation.
-
Install
jqto compact the service account JSON for shell usage. -
Have a running API7 Gateway instance.
-
Have a Google Cloud project with the Vertex AI API enabled.
-
Create a token from the Dashboard and save it to an environment variable:
export API_KEY=your-dashboard-token # replace with your Dashboard token -
Replace
{gateway_group_id}with your gateway group ID. Usedefaultif you are following the quickstart. -
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
export SERVICE_ID=your-service-id # replace with your service ID
Configure GCP Authentication
Create a service account and JSON key by following the Google Cloud service account documentation. Ensure the service account has the Vertex AI User role.
Save the service account JSON to an environment variable as a compact single-line JSON string:
export GCP_SERVICE_ACCOUNT_JSON="$(jq -c . /path/to/service-account.json)"
The jq -c flag keeps the JSON on one line, which avoids shell quoting issues when you reuse the variable in Admin API or ADC examples. The gateway automatically handles OAuth2 token generation and caching from the service account credentials.
Configure the AI Proxy for Vertex AI
Create a route with the ai-proxy plugin:
- Admin API
- ADC
curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
-H "X-API-KEY: ${API_KEY}" \
-d '{
"id": "vertex-ai-route",
"service_id": "'"$SERVICE_ID"'",
"paths": ["/vertex-ai"],
"plugins": {
"ai-proxy": {
"provider": "vertex-ai",
"provider_conf": {
"project_id": "your-gcp-project-id",
"region": "us-central1"
},
"auth": {
"gcp": {
"service_account_json": "'"$GCP_SERVICE_ACCOUNT_JSON"'"
}
},
"options": {
"model": "google/gemini-2.5-flash"
}
}
}
}'
❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.
❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.
❸ Set the model. Vertex AI model names use the google/ prefix.
services:
- name: Vertex AI Service
routes:
- uris:
- /vertex-ai
name: vertex-ai-route
plugins:
ai-proxy:
provider: vertex-ai
provider_conf:
project_id: your-gcp-project-id
region: us-central1
auth:
gcp:
service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
options:
model: google/gemini-2.5-flash
❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.
❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.
❸ Set the model. Vertex AI model names use the google/ prefix.
Synchronize the configuration to API7 Gateway:
adc sync -f adc.yaml
Alternatively, you can use the override.endpoint field to specify the full Vertex AI endpoint directly instead of using provider_conf.
Validate the Configuration
Send a chat completion request:
curl "http://127.0.0.1:9080/vertex-ai" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician." },
{ "role": "user", "content": "What is 1+1?" }
]
}'
You should receive a response similar to the following:
{
"object": "chat.completion",
"model": "google/gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1 + 1 = 2\n"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"completion_tokens": 8,
"total_tokens": 19
}
}
To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.
Next Steps
You have learned how to route traffic to Vertex AI through API7 Gateway. See the Vertex AI documentation and Gemini models for more details.
- Multi-LLM Routing and Fallback — Route across regions or failover to other providers.
- Google Gemini — Use the Gemini AI Studio API for lighter workloads.