Skip to main content

Version: latest

Route Enterprise AI Traffic to Vertex AI

Vertex AI provides access to Google's Gemini models through Google Cloud's enterprise infrastructure with service account authentication, regional endpoints, and enterprise SLAs. This guide shows how to route traffic to Vertex AI through API7 Gateway using the ai-proxy plugin.

Prerequisites

  • Install Docker.

  • Install cURL to send requests to the services for validation.

  • Have a running API7 Enterprise Gateway instance.

  • Have a Google Cloud project with the Vertex AI API enabled.

  • Obtain the Admin API key. Save it to an environment variable:

    export ADMIN_API_KEY=your-admin-api-key   # replace with your API key
  • Obtain the ID of the service you want to configure. Save it to an environment variable:

    export SERVICE_ID=your-service-id         # replace with your service ID

Configure GCP Authentication

Create a service account and JSON key by following the Google Cloud service account documentation. Ensure the service account has the Vertex AI User role.

Save the service account JSON to an environment variable:

export GCP_SERVICE_ACCOUNT_JSON="$(cat /path/to/service-account.json)"

The gateway automatically handles OAuth2 token generation and caching from the service account credentials.

Configure the AI Proxy for Vertex AI

Create a route with the ai-proxy plugin:

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "vertex-ai-route",
"service_id": "$SERVICE_ID",
"paths": ["/vertex-ai"],
"plugins": {
"ai-proxy": {
"provider": "vertex-ai",
"provider_conf": {
"project_id": "your-gcp-project-id",
"region": "us-central1"
},
"auth": {
"gcp": {
"service_account_json": "'"$GCP_SERVICE_ACCOUNT_JSON"'"
}
},
"options": {
"model": "google/gemini-2.5-flash"
}
}
}
}'

❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.

❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.

❸ Set the model. Vertex AI model names use the google/ prefix.

Alternatively, you can use the override.endpoint field to specify the full Vertex AI endpoint directly instead of using provider_conf.

Validate the Configuration

Send a chat completion request:

curl "http://127.0.0.1:9080/vertex-ai" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician." },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
"object": "chat.completion",
"model": "google/gemini-2.5-flash",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1 + 1 = 2\n"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 11,
"completion_tokens": 8,
"total_tokens": 19
}
}

To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Next Steps

You have learned how to route traffic to Vertex AI through API7 Gateway. See the Vertex AI documentation and Gemini models for more details.

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation