Version: 3.15.0

Proxy Vertex AI Requests

Vertex AI provides access to Google's Gemini models through an OpenAI-compatible API.

This guide shows how to integrate APISIX with Vertex AI using the ai-proxy plugin. With provider set to vertex-ai, you can configure your project and region through provider_conf without specifying a custom endpoint.

Prerequisite(s)

Install Docker.
Install cURL to send requests to the services for validation.
Follow the Getting Started Tutorial to start a new APISIX instance in Docker or on Kubernetes.
Have a Google Cloud project with Vertex AI API enabled.

Obtain a Vertex AI Service Account Key

Create a service account and JSON key by following the Google Cloud service account documentation. Ensure the service account has permissions to call Vertex AI (for example, Vertex AI User).

Optionally save the service account JSON to an environment variable:

export GCP_SERVICE_ACCOUNT_JSON="$(cat /path/to/service-account.json)"

Create a Route to Vertex AI

Create a route with the ai-proxy plugin as such:

Admin API
ADC
Ingress Controller

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT -d '{
  "id": "vertex-ai-chat",
  "uri": "/anything",
  "plugins": {
    "ai-proxy": {
      "provider": "vertex-ai",
      "provider_conf": {
        "project_id": "evident-xxx",
        "region": "us-central1"
      },
      "auth": {
        "gcp": {
          "service_account_json": "'"$GCP_SERVICE_ACCOUNT_JSON"'"
        }
      },
      "options": {
        "model": "google/gemini-2.5-flash"
      }
    }
  }
}'

❶ Set the provider to vertex-ai and configure project_id and region.

❷ Replace with your service account JSON.

❸ Set a model supported by Vertex AI, for example google/gemini-2.5-flash.

adc.yaml
services:
  - name: Vertex AI Service
    routes:
      - uris:
          - /anything
        name: vertex-ai-chat
        plugins:
          ai-proxy:
            provider: vertex-ai
            provider_conf:
              project_id: evident-xxx
              region: us-central1
            auth:
              gcp:
                service_account_json: |
                  {
                    ...
                  }
            options:
              model: google/gemini-2.5-flash

❶ Set the provider to vertex-ai and configure project_id and region.

❷ Replace with your service account JSON.

❸ Set a model supported by Vertex AI, for example google/gemini-2.5-flash.

Synchronize the configuration to APISIX:

adc sync -f adc.yaml

Create a Kubernetes manifest file to configure a route:

Gateway API
APISIX CRD

vertex-ai-route.yaml
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: ingress-apisix
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: vertex-ai
        provider_conf:
          project_id: evident-xxx
          region: us-central1
        auth:
          gcp:
            service_account_json: |
              {
                ...
              }
        options:
          model: google/gemini-2.5-flash
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: ingress-apisix
  name: vertex-ai-chat
spec:
  parentRefs:
  - name: apisix
  rules:
  - matches:
    - path:
        type: Exact
        value: /anything
    filters:
    - type: ExtensionRef
      extensionRef:
        group: apisix.apache.org
        kind: PluginConfig
        name: ai-proxy-plugin-config

vertex-ai-route.yaml
apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: ingress-apisix
  name: vertex-ai-route
spec:
  ingressClassName: apisix
  http:
    - name: vertex-ai-route
      match:
        paths:
          - /anything
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: vertex-ai
          provider_conf:
            project_id: evident-xxx
            region: us-central1
          auth:
            gcp:
              service_account_json: |
                {
                  ...
                }
          options:
            model: google/gemini-2.5-flash

❶ Set the provider to vertex-ai and configure project_id and region.

❷ Replace with your service account JSON.

❸ Set a model supported by Vertex AI, for example google/gemini-2.5-flash.

Apply the configuration to your cluster:

kubectl apply -f vertex-ai-route.yaml

Verify

Send a request with the following prompts to the route:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "1 + 1 = 2\n"
      },
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "completion_tokens": 8,
    "extra_properties": {
      "google": {
        "traffic_type": "ON_DEMAND"
      }
    },
    "total_tokens": 19,
    "prompt_tokens": 11
  },
  "object": "chat.completion",
  "model": "google/gemini-2.5-flash",
  ...
}

Next Steps

You have learned how to integrate APISIX with Vertex AI. See the Vertex AI documentation and Gemini models pages for more details.

If you would like to stream responses, enable streaming in your request and use the proxy-buffering plugin to disable NGINX proxy_buffering to avoid server-sent events (SSE) being buffered.

Prerequisite(s)​

Obtain a Vertex AI Service Account Key​

Create a Route to Vertex AI​

Verify​

Next Steps​

Prerequisite(s)

Obtain a Vertex AI Service Account Key

Create a Route to Vertex AI

Verify

Next Steps