What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway

April 14, 2026 · 7 min read

Author

Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done.

In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes.

Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: multiple rules and variable support. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine.

What Changed in APISIX 3.16

APISIX 3.16 introduces two complementary features across the limit-count, limit-conn, and ai-rate-limiting plugins:

Feature	Description	Supported Plugins
Multiple rules	Define an array of rate limiting rules with independent thresholds and time windows	`limit-count`, `limit-conn`, `ai-rate-limiting`
Variable support	Use APISIX variables (`$remote_addr`, `$http_*`, `$consumer_name`, etc.) in `key` and plugin-specific rate or threshold fields	`limit-count`, `limit-conn`, `ai-rate-limiting`

Both features are fully backward compatible. Existing configurations continue to work without modification.

Multiple Rules: Beyond Single-Threshold Rate Limiting

The Problem

Consider a common requirement: limit an API to 10 requests per second and 500 requests per minute. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain.

The Solution

The new rules array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key.

{
  "uri": "/api/v1/*",
  "plugins": {
    "limit-count": {
      "rules": [
        {
          "count": 10,
          "time_window": 1,
          "key": "${remote_addr}_per_second",
          "header_prefix": "per-second"
        },
        {
          "count": 500,
          "time_window": 60,
          "key": "${remote_addr}_per_minute",
          "header_prefix": "per-minute"
        },
        {
          "count": 10000,
          "time_window": 86400,
          "key": "${remote_addr}_per_day",
          "header_prefix": "per-day"
        }
      ],
      "rejected_code": 429
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a 429 response with headers indicating which limit was exceeded:

X-Per-Second-RateLimit-Limit: 10
X-Per-Second-RateLimit-Remaining: 0
X-Per-Second-RateLimit-Reset: 1
X-Per-Minute-RateLimit-Limit: 500
X-Per-Minute-RateLimit-Remaining: 499
X-Per-Minute-RateLimit-Reset: 60

The header_prefix field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic.

Variable Support: Context-Aware Rate Limiting

The Problem

Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift.

The Solution

Variable support lets you pull rate limiting parameters directly from the request context. The count, time_window, and key fields now accept APISIX variables.

Example 1: Per-Tier Rate Limiting via HTTP Header

Suppose your authentication middleware injects an X-Rate-Quota header based on the user's subscription tier. Pair limit-count with an auth plugin such as key-auth so that ${consumer_name} is available as the rate limit key:

{
  "uri": "/api/v1/*",
  "plugins": {
    "key-auth": {},
    "limit-count": {
      "rules": [
        {
          "count": "${http_x_rate_quota ?? 100}",
          "time_window": 60,
          "key": "${consumer_name}"
        }
      ],
      "rejected_code": 429
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

Now the same route handles all tiers:

Tier	`X-Rate-Quota` Header	Effective Limit
Free	100	100 req/min
Pro	1000	1,000 req/min
Enterprise	50000	50,000 req/min

One route. One plugin configuration. All tiers.

Example 2: Multi-Tenant Isolation with Variable Combination

For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint:

{
  "uri": "/api/v1/*",
  "plugins": {
    "limit-count": {
      "rules": [
        {
          "count": 1000,
          "time_window": 60,
          "key": "${http_x_tenant_id} ${uri}"
        }
      ],
      "rejected_code": 429
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

Tenant A calling /api/v1/users and Tenant B calling the same endpoint get independent counters. Tenant A calling /api/v1/orders gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication.

Example 3: Dynamic Concurrent Connection Limits

The limit-conn plugin also supports rules and variables, enabling dynamic concurrency control. The example below uses key-auth so each consumer gets its own connection quota, while a shared cap applies across all consumers using ${http_host ?? global} as the shared key:

{
  "uri": "/api/v1/inference",
  "plugins": {
    "key-auth": {},
    "limit-conn": {
      "default_conn_delay": 0.1,
      "rules": [
        {
          "conn": 5,
          "burst": 2,
          "key": "${consumer_name}"
        },
        {
          "conn": 100,
          "burst": 20,
          "key": "${http_host ?? global}"
        }
      ],
      "rejected_code": 503
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity.

AI Rate Limiting: Token Budget Management

For AI gateway use cases, the ai-rate-limiting plugin works alongside ai-proxy to enforce token budgets at the gateway level. It combines multiple rules with variable support for fine-grained control:

{
  "uri": "/v1/chat/completions",
  "plugins": {
    "ai-rate-limiting": {
      "limit_strategy": "total_tokens",
      "rules": [
        {
          "count": 10000,
          "time_window": 60,
          "key": "${consumer_name}_per_minute",
          "header_prefix": "consumer"
        },
        {
          "count": 500000,
          "time_window": 86400,
          "key": "${consumer_name}_per_day",
          "header_prefix": "daily"
        },
        {
          "count": 1000000,
          "time_window": 60,
          "key": "${http_host ?? global}",
          "header_prefix": "global"
        }
      ],
      "rejected_code": 429
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

This configuration enforces three simultaneous constraints:

Per-consumer burst: 10,000 tokens per minute per consumer
Per-consumer daily: 500,000 tokens per day per consumer
Global capacity: 1,000,000 tokens per minute across all consumers

As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways.

Combining Multiple Rules with Variables

The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing. It uses key-auth to identify consumers, reads per-consumer quotas from request headers, and maintains a shared global safety cap via ${http_host ?? global}:

{
  "uri": "/api/v1/*",
  "plugins": {
    "key-auth": {},
    "limit-count": {
      "rules": [
        {
          "count": "${http_x_burst_quota ?? 10}",
          "time_window": 1,
          "key": "${consumer_name}_per_second",
          "header_prefix": "burst"
        },
        {
          "count": "${http_x_sustained_quota ?? 500}",
          "time_window": 60,
          "key": "${consumer_name}_per_minute",
          "header_prefix": "sustained"
        },
        {
          "count": 100000,
          "time_window": 60,
          "key": "${http_host ?? global}",
          "header_prefix": "global"
        }
      ],
      "rejected_code": 429
    }
  },
  "upstream": {
    "type": "roundrobin",
    "nodes": {
      "127.0.0.1:1980": 1
    }
  }
}

The authentication layer sets per-consumer burst and sustained quotas via headers. APISIX enforces both per-consumer limits dynamically while maintaining a static global safety cap. No route duplication. No configuration drift between tiers.

What's Next

The limit-req plugin (leaky bucket algorithm) does not yet support the rules array (#13179). We welcome community contributions to bring it to feature parity.

We are also exploring deeper integration with external policy engines, enabling rate limiting quotas to be fetched from external key-value stores or policy services at runtime.

Getting Started

Upgrade to APISIX 3.16:

# Docker
docker pull apache/apisix:3.16.0

# Helm
helm repo update
helm upgrade apisix apisix/apisix --set image.tag=3.16.0

Check the full documentation:

What Changed in APISIX 3.16​

Multiple Rules: Beyond Single-Threshold Rate Limiting​

The Problem​

The Solution​

Variable Support: Context-Aware Rate Limiting​

The Problem​

The Solution​

Example 1: Per-Tier Rate Limiting via HTTP Header​

Example 2: Multi-Tenant Isolation with Variable Combination​

Example 3: Dynamic Concurrent Connection Limits​

AI Rate Limiting: Token Budget Management​

Combining Multiple Rules with Variables​

What's Next​

Getting Started​

What Changed in APISIX 3.16

Multiple Rules: Beyond Single-Threshold Rate Limiting

The Problem

The Solution

Variable Support: Context-Aware Rate Limiting

The Problem

The Solution

Example 1: Per-Tier Rate Limiting via HTTP Header

Example 2: Multi-Tenant Isolation with Variable Combination

Example 3: Dynamic Concurrent Connection Limits

AI Rate Limiting: Token Budget Management

Combining Multiple Rules with Variables

What's Next

Getting Started