Skip to main content
Version: 3.17

ai-proxy

Description#

The ai-proxy Plugin simplifies access to LLM and embedding models by transforming Plugin configurations into the designated request format. It supports the integration with OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, Amazon Bedrock, and other OpenAI-compatible APIs.

In addition, the Plugin also supports logging LLM request information in the access log, such as token usage, model, time to the first response, and more. These log entries are also consumed by logging plugins such as http-logger and kafka-logger. These options do not affect error.log.

Request Format#

NameTypeRequiredDescription
messagesArrayTrueAn array of message objects.
messages.roleStringTrueRole of the message (system, user, assistant).
messages.contentStringTrueContent of the message.

Bedrock Converse Request Format#

When provider is set to bedrock, the Plugin expects requests in the Bedrock Converse API format. The request URI must end with /converse and the body must contain a messages array.

NameTypeRequiredDescription
messagesArrayTrueAn array of message objects.
messages.roleStringTrueRole of the message (user, assistant).
messages.contentArrayTrueAn array of content blocks. Each block contains a text field (e.g., [{"text": "What is 1+1?"}]).
systemArrayFalseOptional system prompt blocks (e.g., [{"text": "You are a helpful assistant."}]).
inferenceConfigObjectFalseOptional inference parameters such as maxTokens, temperature, topP, etc.
streamBooleanFalseWhen true, the Plugin proxies the request to Bedrock's ConverseStream endpoint and forwards the response in AWS EventStream (application/vnd.amazon.eventstream) binary framing. The flag is consumed by the Plugin and is not forwarded to Bedrock.

Attributes#

NameTypeRequiredDefaultValid valuesDescription
providerstringTrue[openai, deepseek, azure-openai, aimlapi, anthropic, openrouter, gemini, vertex-ai, bedrock, openai-compatible]LLM service provider. When set to openai, the Plugin will proxy the request to https://api.openai.com/chat/completions. When set to deepseek, the Plugin will proxy the request to https://api.deepseek.com/chat/completions. When set to aimlapi, the Plugin uses the OpenAI-compatible driver and proxies the request to https://api.aimlapi.com/v1/chat/completions by default. When set to anthropic, the Plugin will proxy the request to https://api.anthropic.com/v1/chat/completions by default. When set to openrouter, the Plugin uses the OpenAI-compatible driver and proxies the request to https://openrouter.ai/api/v1/chat/completions by default. When set to gemini, the Plugin uses the OpenAI-compatible driver and proxies the request to https://generativelanguage.googleapis.com/v1beta/openai/chat/completions by default. When set to vertex-ai, the Plugin will proxy the request to https://aiplatform.googleapis.com by default and requires provider_conf or override. When set to bedrock, the Plugin will proxy the request to the AWS Bedrock Converse API (https://bedrock-runtime.<region>.amazonaws.com) and signs the request with AWS SigV4. When set to openai-compatible, the Plugin will proxy the request to the custom endpoint configured in override.
provider_confobjectFalseConfiguration for the specific provider. Required when provider is set to vertex-ai and override is not configured. Required when provider is set to bedrock.
provider_conf.project_idstringTrueGoogle Cloud Project ID.
provider_conf.regionstringTrue (depending on provider)minLength = 1 (for Bedrock)When provider is vertex-ai, this is the Google Cloud Region. When provider is bedrock, this is the AWS region used to construct the Bedrock endpoint and to sign the request with SigV4 (required, must be non-empty).
authobjectTrueAuthentication configurations.
auth.headerobjectFalseAuthentication headers. At least one of header or query must be configured.
auth.queryobjectFalseAuthentication query parameters. At least one of header or query must be configured.
auth.gcpobjectFalseConfiguration for Google Cloud Platform (GCP) authentication.
auth.gcp.service_account_jsonstringFalseContent of the GCP service account JSON file. This can also be configured by setting the GCP_SERVICE_ACCOUNT environment variable.
auth.gcp.max_ttlintegerFalseminimum = 1Maximum TTL (in seconds) for caching the GCP access token.
auth.gcp.expire_early_secsintegerFalse60minimum = 0Seconds to expire the access token before its actual expiration time to avoid edge cases.
auth.awsobjectFalseConfiguration for AWS authentication. Required when provider is bedrock.
auth.aws.access_key_idstringTrueminLength = 1AWS access key ID used for SigV4 signing.
auth.aws.secret_access_keystringTrueminLength = 1AWS secret access key used for SigV4 signing. Stored encrypted.
auth.aws.session_tokenstringFalseminLength = 1Optional AWS session token for temporary credentials (e.g., from STS or assumed roles). Stored encrypted.
optionsobjectFalseModel configurations. In addition to model, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI, you can configure additional parameters such as temperature, top_p, and stream. See your LLM provider's API documentation for more available options.
options.modelstringFalseName of the LLM model, such as gpt-4 or gpt-3.5. Refer to the LLM provider's API documentation for available models. When provider is bedrock and override.endpoint is not configured, model is required and may be a foundation model ID (e.g., anthropic.claude-3-5-sonnet-20240620-v1:0), a cross-region inference profile ID (e.g., us.anthropic.claude-3-5-sonnet-20240620-v1:0), or an application inference profile ARN (e.g., arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/abc123).
overrideobjectFalseOverride setting.
override.endpointstringFalseCustom LLM provider endpoint, required when provider is openai-compatible. When provider is bedrock, this can be set to a custom Bedrock endpoint. If the override URL includes a path containing reserved characters (e.g., Bedrock inference profile ARNs containing : or /), those characters MUST be URL-encoded (: → %3A, / → %2F) so the model ID is preserved as a single path segment.
override.llm_optionsobjectFalseProvider-aware LLM options. See Provider-aware max_tokens mapping.
override.llm_options.max_tokensintegerFalse≥ 1Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. max_completion_tokens for OpenAI Chat Completions, max_output_tokens for OpenAI Responses API, max_tokens for most other providers). Always force-overwrites the client value.
override.request_bodyobjectFalsePer target-protocol request body overrides. Keys are target protocol names (openai-chat, openai-responses, openai-embeddings, anthropic-messages, bedrock-converse); values are partial request bodies that are deep-merged into the outgoing body (objects merged recursively, arrays and scalars replaced wholesale). See Per-protocol request body override.
override.request_body_force_overridebooleanFalsefalseWhen false (default), client request body fields take priority and override.request_body values only fill in missing fields. When true, override.request_body values forcefully overwrite client fields. Does not affect override.llm_options, which always force-overwrites.
loggingobjectFalseLogging configurations. Does not affect error.log.
logging.summariesbooleanFalsefalseIf true, logs request LLM model, duration, request, and response tokens.
logging.payloadsbooleanFalsefalseIf true, logs request and response payload.
timeoutintegerFalse300001 - 600000Request timeout in milliseconds when requesting the LLM service.
max_req_body_sizeintegerFalse67108864>= 1Maximum request body size in bytes that the plugin reads into memory. Requests whose body exceeds this limit are rejected with 413. Prevents unbounded memory buffering of large request bodies.
keepalivebooleanFalsetrueIf true, keeps the connection alive when requesting the LLM service.
keepalive_timeoutintegerFalse60000≥ 1000Keepalive timeout in milliseconds when connecting to the LLM service.
keepalive_poolintegerFalse30≥ 1Keepalive pool size for the LLM service connection.
ssl_verifybooleanFalsetrueIf true, verifies the LLM service's certificate.
streaming_flush_interval_msintegerFalse10≥ 0Interval in milliseconds for the background flush thread. When > 0 (default: 10), a background timer calls ngx.flush(false) every N ms, batching output for bursty upstreams. When 0, the background thread is disabled and each chunk is flushed synchronously via ngx.flush(true), guaranteeing immediate client delivery.

Provider-aware max_tokens mapping#

LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring override.llm_options.max_tokens lets you set a single value in APISIX and have it forwarded under the field name expected by each provider/endpoint. llm_options always force-overwrites the client value.

The table below shows, for each provider and target API endpoint, the upstream field name APISIX rewrites max_tokens to. A — means the provider does not expose that endpoint.

ProviderOpenAI Chat CompletionsOpenAI Responses APIAnthropic Messages
openaimax_completion_tokens ¹max_output_tokens—
openai-compatiblemax_tokensmax_output_tokens—
azure-openaimax_tokens——
deepseekmax_tokens——
aimlapimax_tokens——
openroutermax_tokens——
geminimax_completion_tokens——
vertex-aimax_completion_tokens——
anthropicmax_tokens—max_tokens

¹ When provider is openai and the target is the Chat Completions endpoint, APISIX always rewrites to max_completion_tokens and removes any max_tokens field from the request body — max_tokens has been deprecated in favor of max_completion_tokens by OpenAI.

Per-protocol request body override#

override.request_body provides fine-grained, per-protocol control over the outgoing request body. Keys are target protocol names (openai-chat, openai-responses, openai-embeddings, anthropic-messages); values are partial JSON objects that are deep-merged into the outgoing body after protocol conversion.

Merge semantics:

  • Both sides are plain objects (string-keyed) → recursive merge.
  • Otherwise (scalar, array, type mismatch) → patch value replaces target value wholesale.

Priority between client request and override is controlled by override.request_body_force_override:

  • false (default): if the client request body already sets the field, it is preserved; the override value only fills in when the field is missing.
  • true: the override value forcefully overwrites the client field.

When both llm_options and request_body are configured, llm_options is applied first (always force), then request_body deep-merges on top. This means request_body can override fields set by llm_options.

Examples#

The examples below demonstrate how you can configure ai-proxy for different scenarios.

note

You can fetch the admin_key from config.yaml and save to an environment variable with the following command:

admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')

Proxy to OpenAI#

The following example demonstrates how you can configure the API key, model, and other parameters in the ai-proxy Plugin and configure the Plugin on a Route to proxy user prompts to OpenAI.

Obtain the OpenAI API key and save it to an environment variable:

export OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
}
}
}
}'

Send a POST request to the Route with a system prompt and a sample user question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H "Host: api.openai.com" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Proxy to DeepSeek#

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to DeepSeek.

Obtain the DeepSeek API key and save it to an environment variable:

export DEEPSEEK_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'

Send a POST request to the Route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
]
}'

You should receive a response similar to the following:

{
...
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Apache APISIX is a dynamic, real-time, high-performance API gateway and cloud-native platform. It provides rich traffic management features like load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more. Designed for microservices and serverless architectures, APISIX ensures scalability, security, and seamless integration with modern DevOps workflows."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Proxy to Azure OpenAI#

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to other LLM services, such as Azure OpenAI.

Obtain the Azure OpenAI API key and save it to an environment variable:

export AZ_OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "'"$AZ_OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
},
"override": {
"endpoint": "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
}
}
}
}'

Send a POST request to the Route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
],
"max_tokens": 800,
"temperature": 0.7,
"frequency_penalty": 0,
"presence_penalty": 0,
"top_p": 0.95,
"stop": null
}'

You should receive a response similar to the following:

{
"choices": [
{
...,
"message": {
"content": "Apache APISIX is a modern, cloud-native API gateway built to handle high-performance and low-latency use cases. It offers a wide range of features, including load balancing, rate limiting, authentication, and dynamic routing, making it an ideal choice for microservices and cloud-native architectures.",
"role": "assistant"
}
}
],
...
}

Proxy to Amazon Bedrock#

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Amazon Bedrock using the Converse API. The Plugin signs the upstream request using AWS SigV4 with the credentials configured in auth.aws.

Save your AWS credentials to environment variables:

export AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/bedrock/converse",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "bedrock",
"auth": {
"aws": {
"access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'"
}
},
"options": {
"model": "anthropic.claude-3-5-sonnet-20240620-v1:0"
},
"provider_conf": {
"region": "us-east-1"
}
}
}
}'

Send a POST request to the Route in Bedrock Converse format. Note that the URI must end with /converse:

curl "http://127.0.0.1:9080/bedrock/converse" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": [{"text": "What is 1+1?"}]}
],
"inferenceConfig": {"maxTokens": 256}
}'

You should receive a Bedrock Converse response similar to the following:

{
"output": {
"message": {
"role": "assistant",
"content": [
{"text": "1 + 1 = 2."}
]
}
},
"stopReason": "end_turn",
"usage": {
"inputTokens": 14,
"outputTokens": 9,
"totalTokens": 23
},
...
}

If you need to call an application inference profile by ARN through override.endpoint, the reserved characters in the ARN (: and /) must be URL-encoded as %3A and %2F, for example:

https://bedrock-runtime.us-east-1.amazonaws.com/model/arn%3Aaws%3Abedrock%3Aus-east-1%3A123456789012%3Aapplication-inference-profile%2Fabc123/converse
note

If auth.aws.session_token is set, it is used for temporary credentials (e.g., obtained from AWS STS or an assumed role) and will be added to the SigV4-signed request automatically. Both auth.aws.secret_access_key and auth.aws.session_token are stored encrypted.

Streaming with Bedrock ConverseStream#

To enable streaming, send the same Converse request body with "stream": true. The Plugin routes the request to Bedrock's /model/<model>/converse-stream endpoint and forwards each AWS EventStream frame to the client unchanged. The response Content-Type is application/vnd.amazon.eventstream; clients must parse the binary framing themselves (most AWS SDKs do this automatically).

curl "http://127.0.0.1:9080/bedrock/converse" -X POST \
-H "Content-Type: application/json" \
--data '{
"stream": true,
"messages": [
{"role": "user", "content": [{"text": "What is 1+1?"}]}
]
}' --output -

Proxy to OpenAI Embedding Models#

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to embedding models. This example will use the OpenAI embedding model endpoint.

Obtain the OpenAI API key and save it to an environment variable:

export OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/embeddings",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "text-embedding-3-small",
"encoding_format": "float"
},
"override": {
"endpoint": "https://api.openai.com/v1/embeddings"
}
}
}
}'

Send a POST request to the Route with an input string:

curl "http://127.0.0.1:9080/embeddings" -X POST \
-H "Content-Type: application/json" \
-d '{
"input": "hello world"
}'

You should receive a response similar to the following:

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0067144386,
-0.039197803,
0.034177095,
0.028763203,
-0.024785956,
-0.04201061,
...
],
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}

Proxy to Anthropic#

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Anthropic's Claude API for chat completion.

Obtain an Anthropic API key and save it to an environment variable:

export ANTHROPIC_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-anthropic-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "anthropic",
"auth": {
"header": {
"x-api-key": "'"$ANTHROPIC_API_KEY"'"
}
},
"options": {
"model": "claude-sonnet-4-20250514"
}
}
}
}'

The configuration above specifies anthropic as the provider and attaches the Anthropic API key in the x-api-key header.

Send a POST request to the Route with a system prompt and a sample user question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "1+1 equals 2."
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 19,
"output_tokens": 11
}
}

Convert Anthropic Requests to OpenAI-Compatible Backend#

The following example demonstrates how the ai-proxy Plugin can accept requests in the Anthropic Messages API format and automatically convert them to the OpenAI-compatible format before forwarding to any OpenAI-compatible backend (such as OpenAI, DeepSeek, or other compatible services). This is useful when client applications send Anthropic-formatted requests but you want to use a different LLM backend.

The protocol conversion is triggered automatically when the Route URI is set to /v1/messages (the Anthropic Messages API endpoint). The Plugin will convert Anthropic-formatted requests to OpenAI-compatible format and transform the responses back to Anthropic format.

Obtain an API key for your chosen OpenAI-compatible backend service and save it to an environment variable. This example uses OpenAI:

export BACKEND_API_KEY=<your-api-key>

Create a Route with the URI set to /v1/messages to trigger automatic Anthropic protocol conversion, and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-anthropic-convert-route",
"uri": "/v1/messages",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$BACKEND_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'

The backend provider can be any OpenAI-compatible provider, such as openai, deepseek, or others.

Send a POST request to the Route in Anthropic Messages API format:

curl "http://127.0.0.1:9080/v1/messages" -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: ${BACKEND_API_KEY}" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "gpt-4",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "What is 1+1?" }
]
}'

Although the request is sent in Anthropic format, it will be automatically converted to OpenAI format and forwarded to the backend. The response is converted back to Anthropic format:

{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "1+1 equals 2."
}
],
"model": "gpt-4",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 8
}
}

The Plugin supports all features of the Anthropic Messages API, including streaming (SSE), system prompts, and tool use (function calling). The protocol conversion handles the bidirectional mapping between Anthropic and OpenAI formats transparently.

Proxy to Selected Model using Request Body Parameter#

The following example demonstrates how you can proxy requests to different models on the same URI, based on the user-specified model in the user requests. You will be using the post_arg.* variable to fetch the value of the request body parameter.

The example will use OpenAI and DeepSeek as the example LLM services. Obtain the OpenAI and DeepSeek API keys and save them to environment variables:

export OPENAI_API_KEY=<your-api-key>
export DEEPSEEK_API_KEY=<your-api-key>

Create a Route to the OpenAI API with the ai-proxy Plugin. The Route URI is /anything and it matches requests where the body parameter model is set to openai:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "openai" ]],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'

Create another Route /anything to the DeepSeek API with the ai-proxy Plugin. This Route matches requests where the body parameter model is set to deepseek:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-deepseek-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "deepseek" ]],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'

Send a POST request to the Route with model set to openai:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "openai",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

Send a POST request to the Route with model set to deepseek:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

You can also configure post_arg.* to fetch nested request body parameter. For instance, if the request format is:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": {
"name": "openai"
},
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You can configure the vars on the Route to be [[ "post_arg.model.name", "==", "openai" ]].

Send Request Log to Logger#

The following example demonstrates how you can log request and response information, including LLM model, token, and payload, and push them to a logger. Before proceeding, you should first set up a logger, such as Kafka. See kafka-logger for more information.

Create a Route to your LLM service and configure logging details. Enable summaries to log request LLM model, duration, request and response tokens. Enable payloads to log request and response payload. Update the kafka-logger configuration with your Kafka address, topic, and key:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
},
"logging": {
"summaries": true,
"payloads": true
}
},
"kafka-logger": {
"brokers": [
{
"host": "127.0.0.1",
"port": 9092
}
],
"kafka_topic": "test2",
"key": "key1",
"batch_max_size": 1
}
}
}'

Send a POST request to the Route:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

You should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

In the Kafka topic, you should also see a log entry corresponding to the request with the LLM summary and request/response payload.

Include LLM Information in Access Log#

The following example demonstrates how you can log LLM request related information in the gateway's access log to improve analytics and audit. The following variables are available:

  • request_llm_model: LLM model name specified in the request.
  • request_type: Type of request, where the value could be traditional_http, ai_chat, or ai_stream.
  • llm_time_to_first_token: Duration from request sending to the first token received from the LLM service, in milliseconds.
  • llm_model: LLM model.
  • llm_prompt_tokens: Number of tokens in the prompt.
  • llm_completion_tokens: Number of chat completion tokens in the prompt.

In addition, the following standard nginx upstream variables are automatically populated when ai-proxy sends requests via cosocket transport:

  • upstream_addr: Address of the upstream LLM service (e.g., api.openai.com:443).
  • upstream_status: HTTP status code returned by the upstream LLM service.
  • upstream_response_time: Total time spent receiving the response from the upstream LLM service, in seconds (e.g., 2.858).
  • upstream_connect_time: Time spent establishing the connection to the upstream LLM service, in seconds.
  • upstream_header_time: Time spent receiving the response headers from the upstream LLM service, in seconds.
  • upstream_response_length: Total number of bytes received from the upstream LLM service response body (e.g., 1024).
  • upstream_host: Hostname of the upstream LLM service as configured in the endpoint (e.g., api.openai.com).
  • upstream_scheme: Scheme used to connect to the upstream LLM service (e.g., https).
  • upstream_uri: Request URI path sent to the upstream LLM service (e.g., /v1/chat/completions).

Update the access log format in your configuration file to include additional LLM related variables:

conf/config.yaml
nginx_config:
http:
access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request_line\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\" \"$apisix_request_id\" \"$request_type\" \"$llm_time_to_first_token\" \"$llm_model\" \"$request_llm_model\" \"$llm_prompt_tokens\" \"$llm_completion_tokens\""

Reload APISIX for configuration changes to take effect.

Now if you create a Route and send a request following the Proxy to OpenAI example, you should receive a response similar to the following:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
...
},
"service_tier": "default",
"system_fingerprint": null
}

In the gateway's access log, you should see a log entry similar to the following:

192.168.215.1 - - [21/Mar/2025:04:28:03 +0000] api.openai.com "POST /anything HTTP/1.1" 200 804 2.858 "-" "curl/8.6.0" api.openai.com:443 200 2.858 "https://api.openai.com/v1/chat/completions" "5c5e0b95f8d303cb81e4dc456a4b12d9" "ai_chat" "2858" "gpt-4" "gpt-4" "23" "8"

The access log entry shows the upstream address is api.openai.com:443 with status 200, the request type is ai_chat, APISIX upstream response time is 2.858 seconds, time to first token is 2858 milliseconds, requested LLM model is gpt-4, LLM model is gpt-4, prompt token usage is 23, and completion token usage is 8.