Request Lifecycle and Hooks

At the heart of AISIX is a flexible request processing pipeline built on a system of Hooks. This section explains the journey of an AI request through the gateway and how hooks implement key features.

The Journey of a Request

An AI request passes through several stages as it is processed by AISIX. The following diagram illustrates this lifecycle:

Client Request: A client sends an OpenAI-compatible API request to AISIX.
Authentication Middleware: Before the hook pipeline, authentication middleware validates the API key from the Authorization header. If the key is missing or invalid, the request is rejected with 401 Unauthorized.
Pre-Call Hooks: Before the request is sent to the upstream LLM provider, it passes through pre_call hooks that perform tasks like model validation and rate limiting.
Provider API Call: If all pre-call hooks pass, AISIX forwards the request to the upstream provider defined in the requested Model.
Post-Call Hooks: After receiving a response from the provider, the data passes through post_call hooks. These are used for tasks that depend on the provider's response, such as recording token usage for metrics.
Client Response: AISIX sends the processed response to the client.

Hooks: The Extensible Pipeline

Hooks are the building blocks of AISIX's request processing logic. They are modular components that execute at specific stages of the request lifecycle. This design makes the gateway extensible, allowing new functionalities to be added without altering the core proxying logic.

AISIX includes several built-in hooks that are enabled by default:

Default Hooks

Hook	Stage(s)	Description
`ValidateModelHook`	`pre_call`	Validates the request contains a `model` field, the model exists, and the API key is authorized to access it. Returns `400 Bad Request` if the model field is missing or the model is not found, or `403 Forbidden` if access is denied.
`RateLimitHook`	`pre_call`, `post_call`	Enforces rate limits. In `pre_call`, it checks if the request count exceeds the limit. In `post_call`, it updates the token usage counters.
`MetricHook`	`post_call`	Collects and exposes metrics (e.g., token counts, request latency) for observability via Prometheus.

Hook Stages

The hook system is divided into two stages:

pre_call: Runs before the request is sent to the upstream LLM provider. It is used for validation, authentication, and pre-emptive checks. If a hook terminates the request, it can return an immediate response, and subsequent hooks and the provider call are skipped.
post_call: Runs after a response is received from the provider. It is used for logging, metrics collection, and logic that needs to inspect the final response data.

This phased approach ensures a clean separation of concerns.

Rate Limiting — How the RateLimitHook enforces RPM, TPM, and concurrency limits
Authentication — How the authentication middleware and ValidateModelHook secure LLM access
Observability — How the MetricHook exposes LLM metrics to Prometheus

The Journey of a Request​

Hooks: The Extensible Pipeline​

Default Hooks​

Hook Stages​

Related Docs​

The Journey of a Request

Hooks: The Extensible Pipeline

Default Hooks

Hook Stages

Related Docs