Version: 3.10.x

Monitor AI Traffic and Track LLM Costs

This guide explains how to observe AI traffic and estimate LLM costs using built-in AI logging, APISIX context variables, and standard observability integrations.

Overview

AI observability is different from traditional API observability. For LLM workloads, you need token-level visibility, model attribution, and latency breakdowns such as time to first token.

With API7 AI Gateway, you can collect:

Request and response model metadata.
Prompt and completion token counts.
End-to-end and upstream timing signals.
Optional payload-level logs for request/response content.

The gateway does not calculate billing directly. Cost tracking is derived by mapping token usage to provider pricing.

Prerequisites

Install Docker.
Install cURL to send requests to the services for validation.
Have a running API7 Gateway instance.

Create a token from the Dashboard and save it to an environment variable:

export API_KEY=your-dashboard-token   # replace with your Dashboard token

Replace {gateway_group_id} with your gateway group ID. Use default if you are following the quickstart.
If you are following the Admin API examples, create or reuse a service in API7 Gateway. If you do not have one yet, follow Create or Reuse a Service, then save its ID to an environment variable:
```
export SERVICE_ID=your-service-id         # replace with your service ID
```

Built-in AI Logging

ai-proxy and ai-proxy-multi support a logging configuration with:

logging.summaries (boolean): logs request_model, model, duration, prompt_tokens, completion_tokens, and upstream_response_time.
logging.payloads (boolean): logs request messages, stream flag, and response text content.

Enable logging at route scope (or in shared plugin policy where applicable):

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-observability",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
        "options": { "model": "gpt-4o" },
        "logging": {
          "summaries": true,
          "payloads": false
        }
      }
    }
  }'

❶ Enable AI logging on the ai-proxy plugin.

❷ Log summary-level fields (model + token/timing metadata) for cost and performance analysis.

❸ Keep payload logging disabled by default to reduce sensitive content exposure.

adc.yaml
services:
  - name: AI Observability
    routes:
      - uris:
          - /ai
        name: ai-observability
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4o
            logging:
              summaries: true
              payloads: false

❶ Enable AI logging on the ai-proxy plugin.

❷ Log summary-level fields (model + token/timing metadata) for cost and performance analysis.

❸ Keep payload logging disabled by default to reduce sensitive content exposure.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

For multi-model routes, apply the same logging fields in each ai-proxy-multi instance:

Admin API
ADC

curl -k "https://localhost:7443/apisix/admin/routes?gateway_group_id={gateway_group_id}" -X PUT \
  -H "X-API-KEY: ${API_KEY}" \
  -d '{
    "id": "ai-observability-multi",
    "service_id": "'"$SERVICE_ID"'",
    "paths": ["/ai-multi"],
    "plugins": {
      "ai-proxy-multi": {
        "instances": [
          {
            "name": "openai-primary",
            "provider": "openai",
            "auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
            "options": { "model": "gpt-4o-mini" },
            "logging": {
              "summaries": true,
              "payloads": false
            },
            "weight": 1
          }
        ]
      }
    }
  }'

❶ Configure logging.summaries and logging.payloads per instance on ai-proxy-multi.

adc.yaml
services:
  - name: AI Observability Multi
    routes:
      - uris:
          - /ai-multi
        name: ai-observability-multi
        plugins:
          ai-proxy-multi:
            instances:
              - name: openai-primary
                provider: openai
                auth:
                  header:
                    Authorization: "Bearer ${OPENAI_API_KEY}"
                options:
                  model: gpt-4o-mini
                logging:
                  summaries: true
                  payloads: false
                weight: 1

❶ Configure logging.summaries and logging.payloads per instance on ai-proxy-multi.

Synchronize the configuration to API7 Gateway:

adc sync -f adc.yaml

For the full configuration reference, see ai-proxy and ai-proxy-multi.

APISIX Variables for AI Traffic

The following APISIX context variables can be used in access logs, external logging plugins, and custom observability pipelines:

Variable	Description
`llm_time_to_first_token`	Time to first token (TTFT) for streaming responses.
`llm_prompt_tokens`	Prompt token count reported by the LLM provider.
`llm_completion_tokens`	Completion token count reported by the LLM provider.
`llm_model`	Actual model that generated the response.
`request_llm_model`	Model requested by the client.
`llm_response_text`	Response text content captured in context.

Prometheus Integration

Use the standard prometheus plugin to export gateway metrics. For AI traffic, the built-in metrics are:

apisix_llm_latency (histogram) — end-to-end latency for LLM requests, labeled by route, service, consumer, and provider.
apisix_llm_prompt_tokens (counter) — cumulative prompt token count, labeled by route, URI, host, provider, and model.
apisix_llm_completion_tokens (counter) — cumulative completion token count, labeled by route, URI, host, provider, and model.
apisix_llm_active_connections (gauge) — current active LLM connections, labeled by route, URI, host, provider, and model.

Example PromQL queries:

# Current active LLM connections across all routes
sum(apisix_llm_active_connections)

# Top routes by active LLM connections
topk(10, sum by (route) (apisix_llm_active_connections))

# Alert condition example: sustained high active connections
avg_over_time(sum(apisix_llm_active_connections)[5m]) > 200

Adjust labels and thresholds to your deployment and traffic profile.

Logging AI Requests to External Systems

You can forward structured logs with AI metadata to systems such as ELK, Loki, or Splunk by combining APISIX logging plugins with AI context variables.

Example structured log payload:

{
  "route_id": "27bcdea2-7586-47ad-9262-a3adf4b6699e",
  "service_id": "d8402098-2f80-4e08-afab-21b9ebc62090",
  "model": "deepseek-chat",
  "request_model": "",
  "prompt_tokens": 11,
  "completion_tokens": 16,
  "duration_ms": 3991,
  "ttft_ms": 2159,
  "upstream_response_time": 1133
}

note

route_id and service_id are the gateway-assigned UUIDs for the route and service.
request_model is empty when the client does not specify a model explicitly; the model is then determined by the plugin configuration.
duration_ms is not a built-in AI log field — use $latency (total request time) and $upstream_latency (upstream response time) in your log format configuration to measure request duration.

Use summary logging for default production telemetry. Enable payload logging only when you need short-term debugging with appropriate data handling controls.

Building a Cost Dashboard

API7 Gateway exposes token usage signals, while cost is calculated externally:

estimated_cost = (prompt_tokens × prompt_price_per_token) + (completion_tokens × completion_price_per_token)

Typical dashboard panels in Grafana:

Cost by consumer (hourly/daily).
Cost by model.
Prompt vs. completion token split.
Spend trend and budget threshold alerts.

Implementation approach:

Export AI logs/metrics.
Enrich records with provider pricing metadata.
Compute estimated cost in your log or metrics pipeline.
Visualize and alert in Grafana.

For token governance controls, see Token-Based Rate Limiting and Quota Management.

OpenTelemetry Tracing

Use the standard opentelemetry plugin to trace AI request lifecycle latency across:

Client request ingress.
Gateway processing.
Upstream LLM call.
Response egress.

This helps isolate whether latency is caused by client-side behavior, gateway plugins, or upstream model/provider response time.

Next Steps

Token-Based Rate Limiting and Quota Management — Enforce token budgets per route, consumer, and model.
Multi-LLM Routing and Fallback — Improve reliability and optimize provider/model usage.
For plugin details, see ai-proxy, ai-proxy-multi, prometheus, and opentelemetry.

Overview​

Prerequisites​

Built-in AI Logging​

APISIX Variables for AI Traffic​

Prometheus Integration​

Logging AI Requests to External Systems​

Building a Cost Dashboard​

OpenTelemetry Tracing​

Next Steps​