Version: 3.10.x

Route Enterprise AI Traffic to Vertex AI

Vertex AI provides access to Google's Gemini models through Google Cloud's enterprise infrastructure with service account authentication, regional endpoints, and enterprise SLAs. This guide shows how to route traffic to Vertex AI through API7 Gateway using the ai-proxy plugin.

Prerequisites

Install Docker.
Install cURL to send requests to the services for validation.
Install jq to compact the service account JSON for shell usage.
Have a running API7 Gateway instance.
Have a Google Cloud project with the Vertex AI API enabled.

Create a token from the Dashboard and save it to an environment variable:

export API_KEY=your-dashboard-token   # replace with your Dashboard token

Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
```
export SERVICE_ID=your-service-id         # replace with your service ID
```

Configure GCP Authentication

Create a service account and JSON key by following the Google Cloud service account documentation. Ensure the service account has the Vertex AI User role.

Save the service account JSON to an environment variable as a compact single-line JSON string:

export GCP_SERVICE_ACCOUNT_JSON="$(jq -c . /path/to/service-account.json)"

The jq -c flag keeps the JSON on one line, which avoids shell quoting issues when you reuse the variable in Admin API or ADC examples. The gateway automatically handles OAuth2 token generation and caching from the service account credentials.

Configure the AI Proxy for Vertex AI

Create a route with the ai-proxy plugin:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "vertex-ai-route",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/vertex-ai"],
    "plugins": {
      "ai-proxy": {
        "provider": "vertex-ai",
        "provider_conf": {
          "project_id": "your-gcp-project-id",
          "region": "us-central1"
        },
        "auth": {
          "gcp": {
            "service_account_json": "'"$GCP_SERVICE_ACCOUNT_JSON"'"
          }
        },
        "options": {
          "model": "google/gemini-2.5-flash"
        }
      }
    }
  }'

❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.

❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.

❸ Set the model. Vertex AI model names use the google/ prefix.

adc.yaml
services:
  - name: Vertex AI Service
    routes:
      - uris:
          - /vertex-ai
        name: vertex-ai-route
        plugins:
          ai-proxy:
            provider: vertex-ai
            provider_conf:
              project_id: your-gcp-project-id
              region: us-central1
            auth:
              gcp:
                service_account_json: ${GCP_SERVICE_ACCOUNT_JSON}
            options:
              model: google/gemini-2.5-flash

❶ Set the provider to vertex-ai and configure the GCP project_id and region. The gateway constructs the correct regional endpoint automatically.

❷ Provide the GCP service account JSON. The gateway generates and caches OAuth2 tokens automatically.

❸ Set the model. Vertex AI model names use the google/ prefix.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

Alternatively, you can use the override.endpoint field to specify the full Vertex AI endpoint directly instead of using provider_conf.

Validate the Configuration

Send a chat completion request:

curl "http://127.0.0.1:9080/vertex-ai" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician." },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  "object": "chat.completion",
  "model": "google/gemini-2.5-flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1 + 1 = 2\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 8,
    "total_tokens": 19
  }
}

To enable streaming responses, set "stream": true in the request body. Use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Next Steps

You have learned how to route traffic to Vertex AI through API7 Gateway. See the Vertex AI documentation and Gemini models for more details.

Multi-LLM Routing and Fallback — Route across regions or failover to other providers.
Google Gemini — Use the Gemini AI Studio API for lighter workloads.

Prerequisites​

Configure GCP Authentication​

Configure the AI Proxy for Vertex AI​

Validate the Configuration​

Next Steps​

Prerequisites

Configure GCP Authentication

Configure the AI Proxy for Vertex AI

Validate the Configuration

Next Steps