Skip to main content

Version: latest

Monitor AI Traffic and Track LLM Costs

This guide explains how to observe AI traffic and estimate LLM costs using built-in AI logging, APISIX context variables, and standard observability integrations.

Overview

AI observability is different from traditional API observability. For LLM workloads, you need token-level visibility, model attribution, and latency breakdowns such as time to first token.

With API7 AI Gateway, you can collect:

  • Request and response model metadata.
  • Prompt and completion token counts.
  • End-to-end and upstream timing signals.
  • Optional payload-level logs for request/response content.

The gateway does not calculate billing directly. Cost tracking is derived by mapping token usage to provider pricing.

Prerequisites

  • Install Docker.
  • Install cURL to send requests to the services for validation.
  • Have a running API7 Enterprise Gateway instance. See the Getting Started Guide for setup instructions.

Built-in AI Logging

ai-proxy and ai-proxy-multi support a logging configuration with:

  • logging.summaries (boolean): logs request_model, model, duration, prompt_tokens, completion_tokens, and upstream_response_time.
  • logging.payloads (boolean): logs request messages, stream flag, and response text content.

Enable logging at route scope (or in shared plugin policy where applicable):

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-observability",
"service_id": "$SERVICE_ID",
"paths": ["/ai"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o" },
"logging": {
"summaries": true,
"payloads": false
}
}
}
}'

❶ Enable AI logging on the ai-proxy plugin.

❷ Log summary-level fields (model + token/timing metadata) for cost and performance analysis.

❸ Keep payload logging disabled by default to reduce sensitive content exposure.

For multi-model routes, apply the same logging fields in each ai-proxy-multi instance:

curl "http://127.0.0.1:7080/apisix/admin/routes?gateway_group_id=default" -X PUT \
-H "X-API-KEY: $ADMIN_API_KEY" \
-d '{
"id": "ai-observability-multi",
"service_id": "$SERVICE_ID",
"paths": ["/ai-multi"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "openai-primary",
"provider": "openai",
"auth": { "header": { "Authorization": "Bearer '"$OPENAI_API_KEY"'" } },
"options": { "model": "gpt-4o-mini" },
"logging": {
"summaries": true,
"payloads": false
},
"weight": 1
}
]
}
}
}'

❶ Configure logging.summaries and logging.payloads per instance on ai-proxy-multi.

For the full configuration reference, see ai-proxy and ai-proxy-multi.

APISIX Variables for AI Traffic

The following APISIX context variables can be used in access logs, external logging plugins, and custom observability pipelines:

VariableDescription
llm_time_to_first_tokenTime to first token (TTFT) for streaming responses.
llm_prompt_tokensPrompt token count reported by the LLM provider.
llm_completion_tokensCompletion token count reported by the LLM provider.
llm_modelActual model that generated the response.
request_llm_modelModel requested by the client.
llm_response_textResponse text content captured in context.
llm_raw_usageRaw usage object returned by the provider.

Prometheus Integration

Use the standard prometheus plugin to export gateway metrics. For AI traffic, the confirmed built-in metric is:

  • apisix_llm_active_connections (gauge)

Example PromQL queries:

# Current active LLM connections across all routes
sum(apisix_llm_active_connections)
# Top routes by active LLM connections
topk(10, sum by (route) (apisix_llm_active_connections))
# Alert condition example: sustained high active connections
avg_over_time(sum(apisix_llm_active_connections)[5m]) > 200

Adjust labels and thresholds to your deployment and traffic profile.

Logging AI Requests to External Systems

You can forward structured logs with AI metadata to systems such as ELK, Loki, or Splunk by combining APISIX logging plugins with AI context variables.

Example structured log payload:

{
"route_id": "27bcdea2-7586-47ad-9262-a3adf4b6699e",
"service_id": "d8402098-2f80-4e08-afab-21b9ebc62090",
"model": "deepseek-chat",
"request_model": "",
"prompt_tokens": 11,
"completion_tokens": 16,
"duration_ms": 3991,
"ttft_ms": 2159,
"upstream_response_time": 1133
}
note
  • route_id and service_id are the gateway-assigned UUIDs for the route and service.
  • request_model is empty when the client does not specify a model explicitly; the model is then determined by the plugin configuration.
  • duration_ms is not a built-in AI log field — use $latency (total request time) and $upstream_latency (upstream response time) in your log format configuration to measure request duration.

Use summary logging for default production telemetry. Enable payload logging only when you need short-term debugging with appropriate data handling controls.

Building a Cost Dashboard

API7 Gateway exposes token usage signals, while cost is calculated externally:

estimated_cost = (prompt_tokens × prompt_price_per_token) + (completion_tokens × completion_price_per_token)

Typical dashboard panels in Grafana:

  • Cost by consumer (hourly/daily).
  • Cost by model.
  • Prompt vs. completion token split.
  • Spend trend and budget threshold alerts.

Implementation approach:

  1. Export AI logs/metrics.
  2. Enrich records with provider pricing metadata.
  3. Compute estimated cost in your log or metrics pipeline.
  4. Visualize and alert in Grafana.

For token governance controls, see Token-Based Rate Limiting and Quota Management.

OpenTelemetry Tracing

Use the standard opentelemetry plugin to trace AI request lifecycle latency across:

  • Client request ingress.
  • Gateway processing.
  • Upstream LLM call.
  • Response egress.

This helps isolate whether latency is caused by client-side behavior, gateway plugins, or upstream model/provider response time.

Next Steps

API7.ai Logo

The digital world is connected by APIs,
API7.ai exists to make APIs more efficient, reliable, and secure.

Sign up for API7 newsletter

Product

API7 Gateway

SOC2 Type IIISO 27001HIPAAGDPRRed Herring

Copyright © APISEVEN PTE. LTD 2019 – 2026. Apache, Apache APISIX, APISIX, and associated open source project names are trademarks of the Apache Software Foundation