Skip to main content

· One min read

API monetization is the practice of generating revenue from APIs by charging consumers for access, usage, or the value derived from API-powered integrations. Successful API monetization requires aligning a pricing model with how consumers perceive and extract value, backed by technical infrastructure for metering, rate limiting, and billing.

Why API Monetization Matters#

APIs have shifted from internal integration glue to standalone revenue channels. The global API management market continues to grow rapidly, driven by the platformization of business capabilities.

Companies like Stripe and Twilio have built multi-billion-dollar businesses where the API itself is the product. This pattern is maturing, with a growing number of enterprises monetizing external APIs.

The technical challenge is substantial: you need usage metering accurate to the individual request, rate limiting that enforces plan boundaries in real time, and billing integration that translates API consumption into invoices without manual reconciliation.

API Monetization Models#

Comparison Table#

ModelRevenue TriggerBest ForProsCons
Free / FreemiumConversion to paidDeveloper adoption, market entryLow barrier, viral growthRevenue depends on conversion
Pay-Per-CallEach API requestHigh-volume transactional APIsScales with usage, fair pricingUnpredictable revenue
Subscription TiersMonthly/annual planPredictable workloadsPredictable revenue, simpler billingOverprovision or underprovision risk
Revenue SharingTransaction valueMarketplace, payment APIsAligned incentivesComplex accounting
Transaction-BasedPer business eventPayment processing, shipping APIsValue-aligned pricingRequires event definition

Free and Freemium#

The freemium model offers a free tier with usage caps (typically 1,000-10,000 requests per month) and charges for usage beyond that threshold. Freemium APIs generally see single-digit conversion rates from free to paid plans.

This model works best when the API has broad appeal, a natural expansion path (users start small and grow), and low marginal cost per request. Stripe's original developer onboarding followed this pattern: free to integrate, pay only when processing real transactions.

The risk is subsidizing non-converting users. Effective freemium models set free-tier limits low enough to demonstrate value but high enough to allow meaningful evaluation. Rate limiting at the gateway layer enforces these boundaries without application code changes.

Pay-Per-Call#

Pay-per-call charges a fixed price per API request, typically ranging from $0.001 to $0.05 per call depending on the API's value and computational cost. AWS API Gateway charges $3.50 per million requests; Google Maps Platform charges $5 per 1,000 geocoding requests.

This model aligns cost directly with consumption and is perceived as fair by developers. However, it creates revenue unpredictability for the provider and cost anxiety for the consumer. Unpredictable costs are consistently cited as a top concern among API consumers using pay-per-call pricing.

Implementation requires precise request-level metering. Every API call must be counted, attributed to a consumer, and recorded for billing. API gateways with built-in request counting and consumer identification (via API keys or OAuth tokens) provide this metering layer.

Subscription Tiers#

Tiered subscription pricing offers predefined plans (e.g., Starter at $49/month for 50,000 calls, Professional at $199/month for 500,000 calls, Enterprise at custom pricing) with increasing rate limits, feature access, and support levels.

This is the most common API monetization model. Subscription tiers provide predictable revenue for the provider and predictable costs for the consumer.

The challenge is designing tiers that match actual usage patterns. If 80% of customers cluster in the cheapest tier and 5% need custom enterprise plans, the middle tiers generate minimal revenue. Usage analytics from the API gateway layer inform tier design by revealing actual consumption distributions.

Revenue Sharing#

Revenue sharing takes a percentage of the transaction value facilitated by the API. Stripe charges 2.9% + $0.30 per transaction. Shopify takes a revenue share from apps in its marketplace that use its APIs.

This model aligns provider and consumer incentives because the provider earns more when the consumer's business grows. It works best for APIs that facilitate commerce, payments, or marketplace transactions where the transaction value is clearly attributable.

Revenue-sharing APIs tend to generate higher lifetime customer value compared to flat-rate subscription APIs, though they require more complex accounting and settlement infrastructure.

Transaction-Based#

Transaction-based pricing charges per business event rather than per raw API call. A shipping API might charge per label generated, a payment API per successful charge, or a KYC API per identity verification completed.

This approach captures value more accurately than request counting because a single business transaction may involve multiple API calls (initiate, validate, confirm, webhook). Twilio's pricing model exemplifies this: $0.0075 per SMS sent, regardless of how many API calls the integration makes to send that message.

Implementation requires defining what constitutes a billable event and instrumenting the API to track those events separately from raw request counts. Transaction-based pricing tends to achieve higher gross margins than pay-per-call pricing for APIs with multi-step workflows, because the pricing unit better reflects the value delivered.

Building a Monetization Strategy#

Step 1: Identify the Value Unit#

Determine what unit of value consumers derive from your API. Is it a data record retrieved, a transaction processed, a message sent, or a computation performed? The pricing unit should map to this value unit, not to raw infrastructure metrics.

Step 2: Analyze Consumer Segments#

Different consumers extract different value. A startup making 5,000 API calls per month has different willingness to pay than an enterprise making 50 million. Segment by usage volume, use case, and organizational size. APIs with segment-specific pricing consistently outperform one-size-fits-all pricing in revenue generation.

Step 3: Set Pricing with Data#

Start with competitive analysis (what do comparable APIs charge?), then layer in your cost structure (infrastructure cost per request plus margin) and value-based pricing (what is the consumer's willingness to pay based on the value they derive?). The API gateway's usage analytics provide the data foundation for these calculations.

Step 4: Instrument Metering and Billing#

Technical metering must be accurate, real-time, and attributable to individual consumers. Billing integration must translate metered usage into invoices. These systems must handle edge cases: failed requests (do they count?), cached responses (do they count?), and burst traffic (how is it rated?).

Technical Requirements#

Usage Metering#

Every API request must be captured with consumer identity, endpoint, timestamp, response status, and response size. This data feeds both real-time enforcement (rate limiting) and batch processing (billing). Metering must operate at the gateway layer to capture all traffic regardless of backend implementation.

Most organizations implementing API monetization run metering at the API gateway rather than at the application level. Gateway-level metering is preferred because it provides a single, consistent measurement point.

Rate Limiting#

Rate limiting enforces plan boundaries in real time. A consumer on the Starter plan hitting their 50,000 monthly call limit must receive a clear 429 response with headers indicating their remaining quota and reset time. Rate limiting must be distributed (consistent across multiple gateway nodes), accurate (not approximate), and fast (sub-millisecond decision time).

Usage Analytics#

Raw metering data must be aggregated into dashboards showing consumption trends per consumer, per endpoint, and per time period. These analytics inform tier design, identify upsell opportunities (consumers approaching their limit), and detect anomalies (sudden traffic spikes that may indicate abuse or integration errors).

Billing Integration#

Metering data must flow into a billing system (Stripe Billing, Chargebee, Recurly, or custom) that generates invoices, processes payments, and handles dunning (failed payment recovery). The integration between metering and billing must be reliable: undercounting loses revenue; overcounting erodes trust.

How Apache APISIX Supports API Monetization#

Apache APISIX provides the gateway-layer infrastructure required for API monetization: consumer management, rate limiting, authentication, and logging for metering.

Consumer Management#

APISIX's consumer abstraction represents an API consumer with associated credentials and plugin configurations. Each consumer can have different rate limits, authentication methods, and access policies. This maps directly to monetization tiers: create a consumer group per pricing plan, assign rate limits and quotas per group, and associate individual API keys or OAuth clients with their respective consumer.

Rate Limiting for Plan Enforcement#

The limit-count plugin enforces request quotas per consumer over configurable time windows. A Starter plan consumer can be limited to 50,000 requests per month with a 429 response and X-RateLimit-Remaining headers when the quota is approached. The plugin supports Redis-backed distributed counting, ensuring consistent enforcement across multiple APISIX nodes.

For more granular control, the limit-req plugin enforces requests-per-second limits to prevent burst abuse, while limit-conn controls concurrent connection counts. These three plugins together provide comprehensive traffic shaping aligned with monetization tiers.

Authentication for Consumer Identification#

Monetization requires identifying which consumer made each request. APISIX supports key-auth, JWT authentication, and OpenID Connect for consumer identification. Each authentication method binds requests to a consumer entity, enabling per-consumer metering and rate limiting.

Logging for Usage Metering#

APISIX's logging plugins export request-level data to external systems for metering aggregation. The http-logger sends structured logs to a webhook endpoint, kafka-logger streams to Kafka for high-volume processing, and clickhouse-logger writes directly to ClickHouse for analytical queries. Each log entry includes consumer identity, route, timestamp, status code, and latency, providing the raw data for billing calculations.

A typical monetization pipeline routes APISIX access logs through Kafka into a metering service that aggregates usage per consumer per billing period and feeds the totals into Stripe Billing or a similar platform. Organizations using this architecture typically achieve very high metering accuracy with sub-second log delivery latency.

FAQ#

How do I price my API if I have no usage data yet?#

Start with competitive benchmarking: survey 5-10 comparable APIs and note their pricing structures. Launch with a simple freemium model (generous free tier, one paid tier) to collect usage data. After 90 days, analyze consumption patterns to design informed tiers. Most successfully monetized APIs adjust their pricing model within the first year based on actual usage data.

Should I charge for failed API requests?#

Industry practice varies, but the dominant approach is to not charge for server-side errors (5xx) while counting client-side errors (4xx) against quotas. The rationale is that 4xx errors (bad request, unauthorized, rate limited) result from client behavior, while 5xx errors are provider failures. Document your counting policy clearly in your developer portal. Transparent billing policies consistently rank among the most important factors in API provider selection, alongside documentation quality.

What is a reasonable free-tier limit?#

The free tier should allow a developer to build a proof of concept and demonstrate value to their organization without hitting limits during evaluation. For most APIs, this means 1,000-10,000 requests per month. Data-intensive APIs (maps, AI inference) often set lower limits (100-500 per day) due to higher marginal costs. The key metric is trial-to-paid conversion rate: if your free tier converts below 3%, it may be too generous; above 10%, it may be too restrictive.

How do I handle customers who consistently exceed their tier limits?#

Implement a graduated response: send usage alerts at 80% and 95% of the quota, allow a configurable burst buffer (10-20% over limit) with prorated charges, and only hard-block at a defined overage ceiling. Communicate upsell options proactively when consumers approach limits. APISIX's limit-count plugin supports configurable rejection behavior, and the logging pipeline can trigger automated alerts through webhook integrations when consumers cross threshold percentages.

Related#

· One min read

Apache APISIX and Kong are the two most widely adopted open-source API gateways, both built on NGINX and Lua. APISIX differentiates itself with a fully dynamic architecture powered by etcd, higher single-core throughput, and a broader protocol support matrix, while Kong offers a mature enterprise ecosystem with extensive third-party integrations and a large plugin marketplace.

Overview#

Both projects serve as high-performance, extensible API gateways for microservices architectures. Kong was open-sourced in 2015 and has built a substantial commercial ecosystem around Kong Gateway Enterprise, Kong Konnect, and the Kong Plugin Hub. Apache APISIX entered the Apache Software Foundation incubator in 2019 and graduated as a top-level project in 2020, with rapid community growth.

Both projects are recognized as production-grade gateways and see active production deployments worldwide.

Architecture Comparison#

The architectural differences between APISIX and Kong are fundamental and affect day-to-day operations, scalability, and deployment complexity.

Apache APISIX Architecture#

APISIX uses NGINX as its data plane with Lua plugins running in the request lifecycle. Configuration is stored in etcd, a distributed key-value store that pushes changes to all gateway nodes in real time via watch mechanisms. This architecture means that route changes, plugin updates, and upstream modifications take effect within milliseconds without requiring restarts or reloads. There is no relational database dependency.

The etcd-based design gives APISIX a stateless data plane: any node can be added or removed without migration steps or database schema changes. This makes horizontal scaling straightforward and reduces operational overhead significantly in Kubernetes environments where pods are ephemeral.

Kong Architecture#

Kong also uses NGINX and Lua for its data plane. Configuration is stored in PostgreSQL or Cassandra (though Cassandra support has been deprecated in newer versions). Kong's DB-mode requires database migrations when upgrading, and configuration changes propagate through a polling mechanism with a configurable cache TTL, which introduces a delay between API calls to the Admin API and actual enforcement at the proxy layer.

Kong also offers a DB-less mode where configuration is loaded from a declarative YAML file, which eliminates the database dependency but sacrifices the ability to modify configuration dynamically through the Admin API at runtime. Kong's commercial offering, Konnect, provides a managed control plane that addresses many of these operational concerns.

Performance Benchmarks#

Performance characteristics matter at scale, where even small per-request overhead compounds into significant infrastructure costs.

Key architectural differences that affect performance:

  • Route matching: APISIX uses a radix tree-based routing algorithm. Kong uses a different matching approach. The routing algorithm affects lookup time as the number of routes grows.
  • Configuration propagation: APISIX pushes configuration changes from etcd to all nodes in real time. Kong's DB-mode polls the database on a configurable interval, introducing a delay between configuration changes and enforcement.
  • Memory model: Both use NGINX's event-driven architecture, but their plugin execution models differ in per-request allocation patterns.

We recommend benchmarking both gateways with your actual workload, plugin chain, and hardware to get meaningful performance comparisons. Vendor-published benchmarks often test under ideal conditions that may not reflect your production environment.

For many production deployments, both gateways provide sufficient throughput, and the choice often depends on factors beyond raw performance such as ecosystem maturity, plugin availability, and operational familiarity.

Feature Comparison#

FeatureApache APISIXKong (OSS)
Plugin count (built-in)80+40+ (OSS), 200+ (Enterprise)
Protocol supportHTTP/1.1, HTTP/2, HTTP/3, gRPC, WebSocket, TCP/UDP, MQTT, DubboHTTP/1.1, HTTP/2, gRPC, WebSocket, TCP/UDP
DashboardApache APISIX Dashboard (OSS)Kong Manager (Enterprise only)
Admin APIFull REST API, fully dynamicREST API, DB-mode or DB-less
Service discoveryNacos, Consul, Eureka, DNS, KubernetesDNS, Consul (others via plugins)
Kubernetes ingressAPISIX Ingress Controller (CRD-based)Kong Ingress Controller (KIC)
AI gateway capabilitiesai-proxy plugin, multi-LLM routingAI Gateway plugins (Enterprise)
Multi-language plugin supportGo, Java, Python, Wasm, LuaGo, JavaScript, Python (PDK)
Configuration storageetcd (distributed, real-time)PostgreSQL (requires migrations)
Canary/traffic splittingBuilt-in traffic-split pluginCanary plugin (Enterprise)

Both gateways support core functionality like rate limiting, authentication (JWT, OAuth 2.0, API key, LDAP), load balancing, health checks, and circuit breaking. The primary differences lie in the breadth of built-in features available in the open-source edition versus features gated behind enterprise licensing.

Plugin Ecosystem#

APISIX ships with over 80 built-in plugins covering authentication, security, traffic management, observability, and protocol transformation. Notably, plugins for serverless functions (running custom Lua, Java, or Go code inline), AI proxy routing, and advanced traffic management are available in the open-source edition.

Kong's open-source edition includes approximately 40 built-in plugins, with a substantial number of additional plugins available through Kong Plugin Hub and the enterprise edition. Kong's plugin marketplace includes many third-party and partner-contributed plugins, giving it a broader ecosystem for specific vendor integrations like Datadog, PagerDuty, and Moesif.

For custom plugin development, APISIX supports external plugins via gRPC-based plugin runners in Go, Java, and Python, as well as Wasm-based plugins that run in a sandboxed environment. Kong offers a Plugin Development Kit (PDK) supporting Go, JavaScript, and Python alongside native Lua plugins. Both projects accept community-contributed plugins, and their ecosystems continue to grow.

Kubernetes Integration#

Both gateways offer mature Kubernetes ingress controllers, though they differ in design philosophy.

The APISIX Ingress Controller supports both custom resource definitions (CRDs) specific to APISIX and standard Kubernetes Ingress resources. It communicates with the APISIX data plane through the Admin API and supports Gateway API, the emerging Kubernetes standard for traffic management. Configuration changes propagate instantly through etcd.

The Kong Ingress Controller (KIC) also supports CRDs and standard Kubernetes Ingress resources, with Kong-specific annotations for extended functionality. KIC translates Kubernetes resources into Kong configuration, applying them through the Admin API. KIC has a longer track record in production Kubernetes environments and benefits from extensive documentation and community resources.

Both controllers are actively maintained and see regular releases aligned with Kubernetes version updates.

Community and Ecosystem#

MetricApache APISIXKong
LicenseApache 2.0Apache 2.0 (OSS)
GovernanceApache Software FoundationKong Inc.
First release20192015

APISIX benefits from Apache Software Foundation governance, which ensures vendor-neutral development and community-driven roadmap decisions. Kong benefits from the backing of Kong Inc., which provides dedicated engineering resources, enterprise support, and a commercial ecosystem that many large organizations value.

Both projects maintain active community forums, Slack channels, and regular release cadences. Kong's longer market presence gives it an advantage in terms of available tutorials, third-party integrations, and consultant familiarity.

When to Choose Apache APISIX#

APISIX is the stronger choice when your requirements include:

  • Dynamic configuration at scale: Environments where routes and plugins change frequently benefit from etcd-based instant propagation without restarts.
  • Maximum open-source functionality: Teams that need advanced features like traffic splitting, AI proxy, and multi-protocol support without enterprise licensing.
  • High-performance requirements: Workloads where per-request latency and single-core throughput directly impact infrastructure costs.
  • Kubernetes-native deployments: Organizations adopting Gateway API and wanting tight integration with cloud-native service discovery (Nacos, Consul, Eureka).
  • Vendor-neutral governance: Teams that prefer Apache Software Foundation stewardship over single-vendor control.

When to Choose Kong#

Kong is the stronger choice when your requirements include:

  • Mature enterprise ecosystem: Organizations that need commercial support, SLA guarantees, and a proven enterprise deployment track record.
  • Extensive third-party integrations: Environments with specific vendor integration needs covered by Kong's plugin marketplace.
  • Existing Kong investment: Teams already running Kong in production where migration cost outweighs technical advantages.
  • Managed control plane: Organizations that prefer a SaaS-managed control plane (Kong Konnect) to reduce operational burden.
  • Broad hiring market: Teams that can more easily find engineers with Kong experience due to its longer market presence.

FAQ#

Can APISIX and Kong run side by side during a migration?#

Yes. Both gateways can operate in parallel by splitting traffic at the load balancer level. A common migration strategy routes new services through APISIX while existing services continue running through Kong. Gradual traffic shifting with health checks ensures zero-downtime migration. The timeline depends on the number of routes, custom plugins, and testing requirements.

Is APISIX harder to operate because it requires etcd?#

etcd adds a dependency compared to Kong's DB-less mode, but in practice, etcd is a well-understood, battle-tested component already present in most Kubernetes clusters (it is the backing store for Kubernetes itself). Operating etcd requires standard distributed systems practices: run an odd number of nodes (3 or 5), monitor disk latency, and maintain regular snapshots. For teams already running Kubernetes, etcd operational knowledge is typically already available. The operational cost of etcd is generally lower than managing PostgreSQL migrations required by Kong's DB-mode.

Which gateway has better AI and LLM support?#

Both gateways are investing in AI gateway capabilities, but they approach it differently. APISIX provides the ai-proxy plugin in its open-source edition, supporting multi-model routing, token-based rate limiting, and prompt transformation for major LLM providers. Kong offers AI Gateway plugins primarily through its enterprise edition and Konnect platform. For teams building AI-powered applications on an open-source budget, APISIX currently provides more built-in AI functionality without licensing costs.

How do the two gateways compare on gRPC and streaming support?#

APISIX provides native gRPC proxying, gRPC-Web transcoding, and HTTP-to-gRPC transformation out of the box, along with support for HTTP/3 (QUIC), Dubbo, and MQTT protocols. Kong supports gRPC proxying and gRPC-Web through plugins, with HTTP/2 support on both client and upstream connections. For teams heavily invested in gRPC or multi-protocol architectures, APISIX's broader built-in protocol support reduces the need for custom plugins or sidecars.

Related#

· One min read

A Kubernetes API gateway is the component that manages external traffic entering a Kubernetes cluster and routes it to the appropriate services. It translates Kubernetes-native resource definitions (Ingress resources or Gateway API resources) into routing rules, handling TLS termination, path-based routing, authentication, and traffic policies at the cluster edge.

What is a Kubernetes API Gateway#

Kubernetes does not include a built-in data plane for external traffic management. The platform defines APIs (Ingress, Gateway API) that describe how traffic should be routed, but the actual implementation is delegated to third-party controllers. These controllers run as pods within the cluster, watch for resource changes, and configure their underlying proxy accordingly.

This design reflects Kubernetes' philosophy of extensibility. With Kubernetes now the dominant container orchestration platform, the choice of API gateway is one of the most consequential infrastructure decisions a platform team faces.

The Kubernetes gateway landscape has evolved significantly. The original Ingress resource, introduced in Kubernetes 1.1 (2015), provided minimal routing capabilities. The newer Gateway API, which reached GA for core features in 2023, offers a far richer model with support for traffic splitting, header-based routing, and role-oriented configuration. Adoption of Gateway API resources in new Kubernetes deployments has grown rapidly since its GA release.

Kubernetes Ingress vs Gateway API#

Ingress Resource#

The Ingress resource is Kubernetes' original API for defining external HTTP routing rules. An Ingress object specifies host-based and path-based routing rules that map incoming requests to backend Services.

Ingress is simple but limited. It supports only HTTP and HTTPS traffic, has no native concept of traffic splitting, and lacks a standard way to express advanced routing (header matching, query parameter routing, request mirroring). To work around these limitations, every ingress controller defines its own annotations, creating vendor lock-in and configuration inconsistency.

Despite its limitations, Ingress remains widely deployed. Most Kubernetes clusters still have at least one Ingress resource defined, though many organizations are migrating to Gateway API for new workloads.

Gateway API#

The Gateway API is a collection of Kubernetes custom resources that provide a more expressive and role-oriented model for traffic management. Its core resources are:

  • GatewayClass: Defines a class of gateway implementations (analogous to StorageClass for volumes).
  • Gateway: Declares a gateway instance with listeners for specific protocols and ports.
  • HTTPRoute: Defines HTTP routing rules with support for path matching, header matching, query parameter matching, request mirroring, traffic splitting, and request/response header modification.
  • GRPCRoute, TCPRoute, TLSRoute, UDPRoute: Protocol-specific route types for non-HTTP traffic.

Gateway API's role-oriented design separates infrastructure concerns (managed by platform teams via GatewayClass and Gateway) from application routing (managed by service teams via HTTPRoute). This separation mirrors real organizational structures where platform engineers control the gateway infrastructure and application teams define their own routes.

Gateway API implementations generally process configuration changes faster than equivalent annotation-based Ingress configurations because the structured resource model eliminates the need for annotation parsing and interpretation.

Comparison Table#

CapabilityIngressGateway API
HTTP host/path routingYesYes
Header-based routingVia annotations (non-standard)Native
Traffic splittingVia annotations (non-standard)Native (HTTPRoute weights)
Request mirroringVia annotations (non-standard)Native
gRPC routingVia annotations (non-standard)Native (GRPCRoute)
TCP/UDP routingNot supportedNative (TCPRoute, UDPRoute)
TLS passthroughVia annotations (non-standard)Native (TLSRoute)
Role-based ownershipNo separationGatewayClass/Gateway vs Route
Cross-namespace routingNot supportedNative (ReferenceGrant)
Request header modificationVia annotations (non-standard)Native
Status reportingBasicDetailed per-route conditions
API maturityStable (v1, limited scope)Core features GA, extended features beta

What is an Ingress Controller#

An ingress controller is a Kubernetes controller that watches Ingress (and optionally Gateway API) resources and configures a reverse proxy to implement the defined routing rules. The controller runs as a Deployment or DaemonSet within the cluster and typically exposes itself via a LoadBalancer or NodePort Service.

Every ingress controller uses a different underlying proxy technology. APISIX Ingress Controller uses Apache APISIX. NGINX Ingress Controller uses NGINX. Traefik and Kong act as both the controller and the proxy. The choice of controller determines the available features, performance characteristics, and operational model.

The ingress controller market has consolidated around several primary options: NGINX Ingress Controller (legacy standard), Apache APISIX Ingress Controller (feature-rich, high performance), Traefik (developer-friendly, auto-discovery), and Kong Ingress Controller (API management focus).

Choosing an Ingress Controller#

Apache APISIX Ingress Controller#

APISIX Ingress Controller pairs a Kubernetes-native control plane with the high-performance Apache APISIX data plane. It supports both Ingress resources and Gateway API, allowing gradual migration. Key differentiators include a rich plugin ecosystem (80+ plugins), dynamic configuration without restarts, and sub-millisecond routing latency.

APISIX is built on NGINX and LuaJIT, delivering throughput exceeding 20,000 requests per second per core in benchmarks. Its plugin architecture means that authentication, rate limiting, request transformation, and observability can be configured through Kubernetes custom resources without modifying application code.

NGINX Ingress Controller#

The NGINX Ingress Controller is the most widely deployed option. It is stable and well-documented but relies heavily on annotations for advanced configuration, which creates verbose and hard-to-maintain manifests as complexity grows.

Traefik#

Traefik provides automatic service discovery and integrates with multiple orchestrators beyond Kubernetes. Its middleware system offers a plugin-like model for cross-cutting concerns. Traefik is popular for smaller deployments and developer environments. Its Go-based architecture makes it lightweight but limits per-core throughput compared to NGINX-based controllers.

Kong Ingress Controller#

Kong pairs its API gateway with a Kubernetes controller and offers a path to Kong's commercial API management platform. It provides a plugin ecosystem comparable to APISIX's but uses a PostgreSQL or Cassandra database for configuration storage, adding operational complexity compared to APISIX's etcd-backed approach.

How Apache APISIX Works as a Kubernetes API Gateway#

The APISIX Ingress Controller deploys Apache APISIX as the data plane and a Kubernetes controller as the control plane within the cluster.

Architecture#

The control plane watches Kubernetes resources (Ingress, Gateway API, and APISIX custom resources) and translates them into APISIX routing configurations via the Admin API. The data plane (APISIX instances) handles actual traffic processing. This separation allows the data plane to scale independently based on traffic volume.

A typical production deployment runs 2-3 APISIX data plane replicas behind a cloud load balancer, with a single controller replica (plus a standby) managing configuration. The data plane stores active configuration in shared memory, enabling sub-millisecond routing decisions without external lookups per request.

Gateway API Support#

APISIX Ingress Controller implements the Gateway API specification, supporting GatewayClass, Gateway, and HTTPRoute resources. Platform teams define GatewayClass and Gateway resources that configure the APISIX data plane. Application teams create HTTPRoute resources that define routing rules for their services.

This role-based model aligns with enterprise organizational structures and helps reduce misconfigurations compared to annotation-based Ingress resources.

Custom Resources#

Beyond standard Kubernetes APIs, APISIX Ingress Controller provides custom resources (ApisixRoute, ApisixUpstream, ApisixPluginConfig) that expose the full power of APISIX's plugin ecosystem. These CRDs allow Kubernetes-native configuration of features like JWT authentication, rate limiting, request transformation, and traffic mirroring without resorting to annotations.

Plugin Configuration#

APISIX's 80+ plugins can be configured through Kubernetes custom resources. For example, enabling JWT authentication on a route requires adding a plugin reference to the ApisixRoute resource. The controller translates this into APISIX plugin configuration automatically. Plugin configurations can be shared across routes using ApisixPluginConfig resources, reducing duplication.

Deployment Patterns#

Single Cluster Gateway#

The simplest pattern deploys APISIX as the sole ingress point for a single Kubernetes cluster. All external traffic enters through APISIX, which handles TLS termination, routing, authentication, and rate limiting before forwarding requests to cluster services. This pattern suits organizations with a single production cluster handling moderate traffic volumes.

Multi-Cluster with Shared Gateway#

For organizations running multiple Kubernetes clusters (multi-region, staging/production, or domain-separated), a shared APISIX deployment can route traffic across clusters. APISIX's upstream configuration supports endpoints outside the local cluster, enabling cross-cluster routing. Many organizations now operate multiple production Kubernetes clusters, making cross-cluster traffic management a common requirement.

Gateway Per Namespace#

Large organizations with multiple teams sharing a cluster may deploy separate APISIX instances per namespace or per team. Each team manages its own gateway configuration through Gateway API resources scoped to their namespace. ReferenceGrant resources control cross-namespace access. This pattern provides strong isolation between teams while sharing cluster infrastructure.

Sidecar Gateway#

For latency-sensitive workloads, APISIX can be deployed as a sidecar alongside the application pod. This eliminates the network hop to a centralized gateway but increases resource consumption and operational complexity. This pattern is uncommon and typically reserved for specialized use cases where every millisecond of latency matters.

FAQ#

Should I use Ingress or Gateway API for new Kubernetes deployments?#

Use Gateway API for new deployments. Gateway API provides a richer feature set, role-based ownership, and native support for traffic splitting, header matching, and multi-protocol routing. Ingress will continue to work but receives no new features. The Kubernetes SIG-Network has stated that Gateway API is the future of Kubernetes traffic management. APISIX Ingress Controller supports both, so you can migrate incrementally.

How does APISIX Ingress Controller compare to the NGINX Ingress Controller?#

APISIX offers dynamic configuration without reloads, a richer plugin ecosystem (80+ plugins vs annotation-based configuration), native support for Gateway API, and higher throughput per core. NGINX Ingress Controller has broader community adoption and more third-party documentation. If your requirements include advanced authentication, rate limiting, or request transformation, APISIX provides these as native plugins rather than custom annotations.

Can I run multiple ingress controllers in the same Kubernetes cluster?#

Yes. Kubernetes supports multiple ingress controllers differentiated by IngressClass (for Ingress resources) or GatewayClass (for Gateway API resources). A common pattern runs APISIX for external-facing APIs requiring authentication and rate limiting, and a lightweight controller like Traefik for internal developer tools. Each Ingress or HTTPRoute resource specifies which controller should handle it.

What is the resource overhead of running APISIX in Kubernetes?#

A production APISIX data plane replica typically requests 500m CPU and 256Mi memory, handling 10,000-20,000 requests per second depending on plugin configuration. The controller replica requests 200m CPU and 128Mi memory. For most clusters, two data plane replicas and one controller replica provide sufficient capacity and redundancy. These resource requirements are comparable to other Kubernetes ingress controllers and negligible relative to the application workloads they protect.

Related#

· One min read

An AI gateway is a specialized API gateway that manages traffic between applications and large language models (LLMs), enforcing token-based rate limiting, model routing, cost controls, and content safety policies. As AI agents adopt the Model Context Protocol (MCP) to interact with external tools and data sources, AI gateways become essential infrastructure for securing, observing, and scaling these interactions in production environments.

What is an AI Gateway#

An AI gateway sits between your applications and AI model providers (OpenAI, Anthropic, Google, open-source models), routing requests, enforcing policies, and providing observability across all AI interactions. Unlike traditional API gateways that focus on REST and gRPC traffic patterns, AI gateways understand LLM-specific concerns: token consumption, prompt structure, model-specific rate limits, and response streaming.

The market for AI infrastructure is expanding rapidly, with enterprise adoption of generative AI APIs and models accelerating across industries. This growth creates urgent demand for infrastructure that manages AI traffic with the same rigor applied to traditional API traffic. For more on AI gateway capabilities, see the APISIX AI Gateway overview.

The Rise of AI Agents and LLM Traffic#

AI agents represent a shift from simple prompt-response interactions to autonomous, multi-step workflows where LLMs invoke tools, query databases, browse the web, and orchestrate complex tasks. Unlike a single chatbot API call, an agent workflow may generate dozens of LLM invocations, tool calls, and data retrievals to complete a single user request.

Much of the economic value from generative AI will flow through agentic AI systems that operate autonomously on behalf of users and organizations. Developer adoption of AI agent frameworks has accelerated rapidly, as reflected in growing open-source activity and ecosystem investment.

This growth in agentic AI creates a traffic management challenge. A single agent interaction might produce 10-50 API calls across multiple model providers and tool servers. Without gateway-level management, organizations face unpredictable costs, security blind spots, and no centralized observability over AI operations.

What is MCP (Model Context Protocol)#

The Model Context Protocol (MCP) is an open standard introduced by Anthropic that defines how AI assistants connect to external tools, data sources, and services. MCP provides a standardized interface that replaces the fragmented, vendor-specific tool integration patterns that emerged as AI agents proliferated.

Before MCP, every AI application needed custom integration code for each tool and data source. An agent that needed to query a database, search documents, and call an API required three separate integration implementations, each with its own authentication, error handling, and data formatting logic. MCP standardizes this interaction into a single protocol that any AI assistant can use with any MCP-compatible server.

The protocol draws inspiration from the Language Server Protocol (LSP), which standardized how code editors communicate with language-specific tooling. Just as LSP eliminated the need for every editor to implement every language's features independently, MCP aims to eliminate the need for every AI application to implement every tool integration independently. Since its release, MCP adoption has grown significantly, with a large number of community-built MCP servers available and major AI platforms including support for the protocol.

MCP Architecture#

MCP follows a client-server architecture with clear separation of concerns across four components.

Host#

The host is the AI application that initiates interactions. It could be a desktop AI assistant, an IDE with AI capabilities, a chatbot platform, or any application that leverages LLMs. The host creates and manages MCP client instances and controls which servers the AI model can access, enforcing security boundaries.

Client#

The MCP client is a protocol handler embedded within the host application. Each client maintains a one-to-one connection with a single MCP server. The client handles protocol negotiation, capability discovery, and message routing between the host and the server.

Server#

MCP servers expose tools, resources, and prompts to AI clients through a standardized interface. A server might wrap a database, a file system, a web API, a code repository, or any other data source or capability. Servers declare their capabilities during an initialization handshake, allowing clients to discover available tools dynamically.

Transport#

MCP supports multiple transport mechanisms. The stdio transport communicates through standard input/output streams, suitable for local server processes. The Streamable HTTP transport (which supersedes the earlier SSE-based transport) uses HTTP for remote server communication, enabling servers to run as network services accessible across infrastructure boundaries. In production environments, the HTTP-based transport is widely preferred for its flexibility in distributed deployments.

Why AI Traffic Needs Gateway Management#

AI traffic introduces challenges that traditional API management was not designed to handle.

Security#

AI agents with tool access can potentially reach sensitive systems. Without centralized policy enforcement, an agent might access production databases, execute privileged operations, or leak sensitive data through prompts sent to third-party model providers. Data leakage is widely cited as a primary security concern among organizations deploying AI agents.

Rate Limiting#

LLM providers impose rate limits measured in tokens per minute and requests per minute. These limits differ by model, tier, and provider. An AI gateway tracks token consumption across all applications and enforces limits before requests are rejected by upstream providers, preventing cascading failures.

Cost Control#

LLM API costs scale with token consumption, and agentic workflows can generate substantial token volumes. A single complex agent task might consume 100,000 tokens across multiple model calls. Without gateway-level cost tracking and budget enforcement, organizations frequently discover unexpected AI spending.

Observability#

Debugging agentic AI workflows requires end-to-end visibility across model calls, tool invocations, and data retrievals. Traditional logging captures individual HTTP requests but misses the logical flow of an agent's reasoning chain. AI gateways correlate related requests into coherent traces, making it possible to understand why an agent made specific decisions.

Multi-Provider Routing#

Organizations increasingly use multiple model providers to optimize for cost, latency, and capability. An AI gateway routes requests to the appropriate provider based on model availability, cost thresholds, latency requirements, and task complexity, functioning as an intelligent load balancer for AI traffic.

Key AI Gateway Features#

Modern AI gateways provide capabilities specifically designed for LLM and agent traffic patterns.

LLM load balancing distributes requests across multiple model endpoints, providers, or self-hosted instances. This includes weighted routing, failover, and least-latency selection. Organizations running self-hosted models alongside commercial APIs use load balancing to optimize cost and performance simultaneously.

Token-based rate limiting tracks consumption in tokens rather than simple request counts. Since a single LLM request can consume anywhere from 100 to 100,000 tokens depending on context length, request-based rate limiting is insufficient. Token-aware rate limiting provides accurate cost and capacity management.

Prompt caching stores responses for repeated or similar prompts, reducing latency and cost for common queries. Semantic caching extends this by matching prompts based on meaning rather than exact text. Effective prompt caching strategies can meaningfully reduce both latency and cost for common queries.

Model fallback automatically redirects traffic to alternative models when a primary provider experiences outages, rate limit exhaustion, or elevated latency. Fallback chains can be configured with degradation policies (for example, falling back from GPT-4 to GPT-3.5 with a user notification).

Content moderation inspects prompts and responses for policy violations, sensitive data, prompt injection attempts, and harmful content. Gateway-level moderation ensures consistent enforcement regardless of which application or agent generates the traffic.

How APISIX Supports AI Workloads#

Apache APISIX provides AI gateway capabilities through its plugin architecture, enabling organizations to manage LLM traffic alongside traditional API traffic within a single gateway infrastructure.

The ai-proxy plugin provides a unified interface for routing requests to multiple LLM providers including OpenAI, Anthropic, Azure OpenAI, and self-hosted models. It handles provider-specific authentication, request format translation, and response normalization, allowing applications to switch between providers without code changes.

APISIX supports token-based rate limiting through its rate limiting plugins configured with token consumption metrics, enabling organizations to enforce per-consumer and per-route token budgets. Combined with the logging and metrics plugins, this provides complete visibility into AI spending across all applications and teams.

For MCP-to-HTTP bridging, APISIX can proxy traffic between MCP clients using Streamable HTTP transport and backend MCP servers, applying the same authentication, rate limiting, and observability policies that govern traditional API traffic. This enables organizations to expose MCP servers through a managed gateway layer rather than allowing direct network access from AI agents to tool servers.

APISIX's dynamic configuration through etcd is particularly valuable for AI workloads where model endpoints, rate limits, and routing rules change frequently as new models are released, pricing changes, and usage patterns evolve. Configuration changes take effect in milliseconds without gateway restarts, enabling rapid response to provider outages or cost threshold breaches.

Future of AI Infrastructure#

The convergence of AI gateways and traditional API gateways is accelerating. As AI capabilities become embedded in every application, the distinction between "AI traffic" and "regular API traffic" will blur. Gateways that manage both traffic types within a unified policy framework will have a significant advantage over point solutions.

MCP adoption is likely to grow as more AI platforms and tool providers implement the protocol, creating demand for infrastructure that can manage MCP traffic at enterprise scale. The protocol's evolution toward more sophisticated transport mechanisms, authentication models, and capability negotiation will require gateway-level support to handle securely.

As worldwide spending on AI infrastructure continues to grow rapidly, a meaningful portion will flow through AI gateway infrastructure that provides the security, observability, and cost management enterprises require before deploying AI agents in production environments.

FAQ#

What is the difference between an AI gateway and a traditional API gateway?#

An AI gateway extends traditional API gateway capabilities with LLM-specific features: token-based rate limiting, prompt inspection, model routing, cost tracking, and response streaming support. A traditional API gateway manages REST and gRPC traffic with request-based rate limiting, authentication, and load balancing. Modern platforms like Apache APISIX blur this distinction by supporting both traditional and AI-specific traffic management within a single gateway, eliminating the need for separate infrastructure.

How does MCP relate to function calling and tool use in LLMs?#

Function calling (also called tool use) is the LLM capability to generate structured outputs that invoke external functions. MCP standardizes the infrastructure layer that connects these function calls to actual tool implementations. Where function calling defines what the model wants to do, MCP defines how the request reaches the tool server and how results return to the model. MCP is complementary to function calling, not a replacement.

Can I use an AI gateway without adopting MCP?#

Yes. AI gateways manage all types of AI traffic, including direct LLM API calls that do not use MCP. Most organizations start with basic LLM proxy and rate limiting features before adopting MCP for tool integration. The gateway provides value regardless of whether your AI applications use MCP, custom tool integrations, or simple prompt-response patterns.

Related#

· One min read

An open-source API gateway sits between clients and backend services, handling routing, authentication, rate limiting, and observability. Apache APISIX, Kong, Envoy, and Traefik are among the most widely adopted options, each with distinct architectural decisions that affect performance, extensibility, and operational complexity.

Why the Choice of API Gateway Matters#

Organizations running microservices at scale route millions of requests per day through their gateway layer. The gateway you choose determines your latency floor, plugin flexibility, and how much operational overhead your platform team absorbs.

Choosing poorly means rearchitecting under pressure. Choosing well means a gateway that scales with your traffic for years without becoming a bottleneck.

Feature Comparison Table#

FeatureApache APISIXKongEnvoyTraefik
LanguageLua (NGINX + LuaJIT)Lua (NGINX + LuaJIT)C++Go
Configuration StoreetcdPostgreSQL / CassandraxDS API (control plane)File / KV stores
Admin APIRESTful, fully dynamicRESTfulxDS gRPCREST + dashboard
Hot ReloadYes, sub-millisecondPartial (DB polling)Yes (xDS push)Yes (provider watch)
Plugin Count100+ built-in60+ bundled (more in Hub)~30 HTTP filters~30 middlewares
Plugin LanguagesLua, Java, Go, Python, WasmLua, Go (PDK)C++, WasmGo (middleware)
gRPC ProxyingNativeSupportedNativeSupported
HTTP/3 (QUIC)SupportedExperimentalSupportedSupported
DashboardBuilt-in (APISIX Dashboard)Kong Manager (Enterprise)None (third-party)Built-in
LicenseApache 2.0Apache 2.0 (OSS) / Proprietary (Enterprise)Apache 2.0MIT

Note: Feature details are based on each project's official documentation as of early 2026. Check the respective project sites for the latest status.

Detailed Breakdown#

Apache APISIX#

Apache APISIX is built on NGINX and LuaJIT, using etcd as its configuration store. This architecture eliminates database dependencies on the data path: route changes propagate to every gateway node within milliseconds without restarts or reloads.

The plugin ecosystem includes over 100 built-in options spanning authentication (JWT, key-auth, OpenID Connect), traffic management (rate limiting, circuit breaking), observability (Prometheus, Zipkin, OpenTelemetry), and transformation (request/response rewriting, gRPC transcoding). Developers can write custom plugins in Lua, Go, Java, Python, or WebAssembly, making it one of the most polyglot gateway runtimes available.

APISIX supports the Kubernetes Ingress Controller pattern natively. The APISIX Ingress Controller watches Kubernetes resources and translates them into APISIX routing configuration, enabling declarative GitOps workflows while preserving the full plugin surface.

As an Apache Software Foundation top-level project, APISIX is community-governed and vendor-neutral.

Kong#

Kong is the longest-established open-source API gateway, with a mature commercial ecosystem. It shares the NGINX + LuaJIT foundation with APISIX but relies on PostgreSQL or Cassandra as its configuration store. This architectural choice introduces a database dependency for configuration storage, which adds operational complexity for HA deployments.

Kong's plugin hub offers approximately 60 bundled plugins in the open-source edition, with additional enterprise-only plugins for advanced features like OAuth2 introspection and advanced rate limiting. The Go Plugin Development Kit (PDK) allows extending Kong in Go, though Lua remains the primary plugin language.

Kong has a strong enterprise support ecosystem with commercial offerings (Kong Gateway Enterprise, Kong Konnect) and a large user community.

Envoy#

Envoy is a high-performance C++ proxy originally built at Lyft, now a CNCF graduated project. It excels as a service mesh data plane and is the foundation for Istio, AWS App Mesh, and other mesh implementations.

Envoy's configuration model uses the xDS (discovery service) API, a gRPC-based protocol that pushes configuration updates from a control plane. This design is powerful but means Envoy does not function as a standalone gateway without a control plane component. Organizations adopting Envoy as an edge gateway typically pair it with a control plane like Gloo Edge or similar tools.

The filter chain model supports around 30 built-in HTTP filters. Custom extensions require C++ or WebAssembly, raising the barrier for teams without C++ expertise. Envoy is most commonly deployed as a sidecar proxy within a service mesh, though it is also used as an edge proxy.

Traefik#

Traefik is written in Go and designed for automatic service discovery. It integrates natively with Docker, Kubernetes, Consul, and other orchestrators, automatically detecting new services and generating routes without manual configuration. This auto-discovery model makes Traefik popular for development environments and smaller-scale production deployments.

Traefik includes built-in Let's Encrypt integration for automatic TLS certificate provisioning, a feature that requires additional tooling in other gateways. Its middleware system offers approximately 30 built-in options covering authentication, rate limiting, headers manipulation, and circuit breaking.

Traefik has a large community and is widely used in Docker-native environments.

Performance Considerations#

Performance varies significantly based on configuration, plugin chains, TLS termination, and upstream complexity. When evaluating gateways, run your own benchmarks with your actual workload patterns rather than relying on vendor-published numbers.

Key factors that affect gateway performance:

  • Architecture: C++ and LuaJIT-based gateways (Envoy, APISIX, Kong) generally achieve lower latency than pure Go implementations
  • Configuration store: Gateways that avoid database queries on the data path (APISIX, Envoy) tend to have more consistent latency
  • Plugin overhead: Each active plugin adds processing time. Test with your actual plugin chain enabled
  • Connection handling: The NGINX event-driven model (APISIX, Kong) handles high concurrency efficiently

We recommend benchmarking the specific gateways you are considering with a representative workload on hardware similar to your production environment.

When to Choose Which#

Choose Apache APISIX when you need a large built-in plugin ecosystem, fully dynamic configuration without restarts, multi-language plugin support, and no database dependency. It suits teams building platform-grade API infrastructure. See the getting started guide to evaluate it hands-on.

Choose Kong when you are operating in an enterprise environment with existing Kong deployments, need commercial support, or require specific enterprise-only plugins. Kong's maturity means more third-party integrations and consultants are available.

Choose Envoy when your primary use case is a service mesh data plane, you need advanced load balancing algorithms, or you are already running Istio or a similar mesh. Envoy is less suited as a standalone edge gateway due to its control plane dependency.

Choose Traefik when auto-discovery and zero-configuration routing are priorities, or you need built-in Let's Encrypt integration without additional tooling. Traefik excels in Docker-native and small-to-medium Kubernetes environments.

Migration Considerations#

Migrating between gateways is nontrivial and typically requires careful planning. Key factors include:

  • Plugin compatibility: Not all plugins have equivalents across gateways. Audit your active plugins and identify gaps before migrating.
  • Configuration translation: Each gateway uses a different configuration format. Automated translation tools can help but manual verification is essential.
  • Operational tooling: Monitoring dashboards, CI/CD pipelines, and alerting rules need updating.
  • Canary approach: Running both gateways in parallel behind a load balancer and shifting traffic gradually is the safest migration strategy.

Frequently Asked Questions#

Is Apache APISIX production-ready for enterprise workloads?#

Yes. Apache APISIX is an Apache Software Foundation top-level project used in production by organizations worldwide. The etcd-backed architecture provides high availability without single points of failure when deployed with an etcd cluster.

Can I migrate from Kong to APISIX without downtime?#

A zero-downtime migration is achievable using a canary deployment approach: run both gateways in parallel behind a load balancer, gradually shifting traffic from Kong to APISIX as you validate route-by-route equivalence. APISIX supports most Kong plugin equivalents natively, and the Admin API allows automated route provisioning during migration.

How do open-source API gateways compare to cloud-managed options like AWS API Gateway?#

Cloud-managed gateways trade control for convenience. They handle infrastructure operations but impose vendor lock-in, per-request pricing that grows with traffic volume, and limited plugin customization. Open-source gateways like APISIX provide full control over the data plane, support multi-cloud and hybrid deployments, and eliminate per-request platform fees.

Which gateway has the best Kubernetes support?#

All four gateways support Kubernetes, but the depth varies. APISIX and Kong offer dedicated ingress controllers with CRD-based configuration. Envoy integrates through the Kubernetes Gateway API and service mesh deployments. Traefik auto-discovers Kubernetes services natively. The emerging Kubernetes Gateway API standard is supported by all four projects to varying degrees, and is becoming the recommended approach for new deployments.

Related#

· One min read

An API gateway is a server that sits between clients and backend services, acting as the single entry point for all API traffic. It accepts incoming requests, applies policies such as authentication, rate limiting, and transformation, then routes each request to the appropriate upstream service and returns the response to the caller.

In practice, an API gateway consolidates cross-cutting concerns that would otherwise be duplicated across every microservice: access control, traffic shaping, observability, and protocol translation. Instead of embedding this logic in each service, teams centralize it at the gateway layer, reducing code duplication, simplifying deployments, and giving platform teams a single control plane for governing API behavior at scale.

How Does an API Gateway Work?#

The request lifecycle through an API gateway follows a well-defined pipeline:

  1. Client sends a request. A mobile app, browser, or upstream service issues an HTTP/HTTPS request to the gateway's public endpoint. The client never communicates directly with individual backend services.

  2. Gateway evaluates policies. The gateway inspects the incoming request and runs it through a chain of plugins or middleware. This typically includes validating authentication tokens (JWT, OAuth 2.0, API keys), enforcing rate limits, checking IP allowlists, and applying request transformations such as header injection or body rewriting.

  3. Gateway routes to the upstream. Based on the request path, host header, or other matching criteria, the gateway selects a target upstream service. If multiple instances are registered, the gateway applies a load-balancing algorithm (round-robin, least connections, consistent hashing) to pick a specific node.

  4. Backend processes the request. The upstream service handles the business logic and returns a response to the gateway.

  5. Gateway processes the response. Before forwarding the response to the client, the gateway can apply response transformations, inject CORS headers, compress the payload, or cache the result for subsequent identical requests.

  6. Gateway returns the response. The final response reaches the client with appropriate status codes, headers, and payload. Throughout this entire cycle, the gateway emits metrics, access logs, and traces that feed into observability systems.

This pipeline executes in milliseconds. High-performance gateways like Apache APISIX complete it in under 1ms of added latency, making the overhead negligible even for latency-sensitive workloads.

Key Features of an API Gateway#

A production-grade API gateway provides a broad surface of capabilities. The following features represent the core functionality that distinguishes a gateway from a simple reverse proxy.

Request Routing#

The gateway matches incoming requests to upstream services based on URI paths, HTTP methods, headers, query parameters, or custom expressions. Advanced gateways support regex-based matching, wildcard routes, and priority-weighted rules. Apache APISIX supports radixtree-based routing that scales efficiently even with thousands of route entries.

Load Balancing#

Distributing traffic across service instances prevents hotspots and improves availability. Gateways typically support round-robin, weighted round-robin, least connections, consistent hashing, and EWMA (exponentially weighted moving average) algorithms. Health checks --- both active probes and passive failure detection --- automatically remove unhealthy nodes from the upstream pool.

Authentication and Authorization#

Centralizing identity verification at the gateway eliminates the need for each service to implement its own auth stack. Common mechanisms include JWT validation, OAuth 2.0 token introspection, HMAC signatures, LDAP, and API key authentication. Some gateways also integrate with external identity providers through OpenID Connect.

Rate Limiting#

Rate limiting protects backend services from traffic spikes, abusive clients, and cascading failures. Gateways enforce limits at multiple granularities: per consumer, per route, per IP, or globally. Apache APISIX provides configurable rate limiting plugins that support both fixed-window and leaky-bucket algorithms, with shared counters across gateway nodes via Redis.

Caching#

Response caching at the gateway layer reduces backend load and improves latency for read-heavy endpoints. Gateways cache responses based on configurable TTLs, cache keys (URI, headers, query strings), and invalidation rules. For APIs serving relatively static data --- product catalogs, configuration endpoints, reference data --- caching can reduce upstream requests by 80% or more.

Request and Response Transformation#

Gateways can rewrite requests before they reach the backend and transform responses before they reach the client. This includes header manipulation, body rewriting, protocol translation (HTTP to gRPC, REST to GraphQL), and payload format conversion. Transformation eliminates the need for adapter services and simplifies API versioning.

Monitoring and Observability#

A gateway sees every request, making it the natural instrumentation point for API metrics. Production gateways export access logs, request/response latencies (P50, P95, P99), error rates, and throughput to systems like Prometheus, Datadog, and OpenTelemetry collectors. Apache APISIX ships with built-in integrations for Prometheus, Grafana, SkyWalking, and Zipkin.

SSL/TLS Termination#

The gateway handles TLS handshakes, certificate management, and encryption offloading so that backend services can communicate over plain HTTP internally. This simplifies certificate rotation, centralizes security policy, and reduces CPU overhead on upstream services. Modern gateways also support mTLS for service-to-service authentication.

Circuit Breaking#

When a backend service becomes degraded or unresponsive, a circuit breaker at the gateway stops forwarding requests to it, preventing cascading failures across the system. After a configurable cooldown, the gateway sends probe requests to test recovery. This pattern is critical in microservices architectures where a single failing service can take down an entire request chain.

API Versioning and Canary Releases#

Gateways can route a percentage of traffic to new service versions, enabling canary deployments and blue-green releases without infrastructure changes. Traffic-splitting rules let teams gradually shift load from v1 to v2, monitor error rates, and roll back instantly if metrics degrade.

API Gateway vs Load Balancer vs Reverse Proxy#

These three components overlap in functionality but serve different primary purposes:

CapabilityReverse ProxyLoad BalancerAPI Gateway
Request forwardingYesYesYes
SSL terminationYesSometimesYes
Load balancingBasicAdvancedAdvanced
Health checksLimitedYesYes
AuthenticationNoNoYes
Rate limitingNoNoYes
Request transformationNoNoYes
API-aware routingNoNoYes
Response cachingYesNoYes
Observability/metricsBasicBasicComprehensive
Protocol translationNoNoYes
Plugin/middleware ecosystemLimitedNoExtensive

A reverse proxy (e.g., Nginx, HAProxy in proxy mode) forwards client requests to backend servers, provides SSL termination, and can cache static content. It operates at the HTTP level but lacks API-specific intelligence.

A load balancer (e.g., AWS ALB, HAProxy, Envoy in LB mode) distributes traffic across server instances using health checks and balancing algorithms. Layer 4 load balancers work at the TCP level; Layer 7 load balancers can inspect HTTP headers but still lack API-layer logic like authentication or transformation.

An API gateway builds on reverse proxy and load balancing capabilities but adds an API-aware policy layer: authentication, rate limiting, request/response transformation, observability, and developer portal integration. It is purpose-built for managing API traffic.

In practice, many organizations start with a reverse proxy or load balancer and later adopt an API gateway as their API surface grows. Some gateways, including Apache APISIX, are built on top of proven proxies (APISIX uses Nginx and OpenResty) and inherit their performance characteristics while adding the API management layer.

API Gateway Use Cases#

Microservices Architecture#

In a microservices system with dozens or hundreds of services, an API gateway provides the single entry point that abstracts internal service topology from external consumers. Clients interact with one stable endpoint; the gateway handles service discovery, routing, and cross-cutting concerns. Without a gateway, each client must know the location and protocol of every service, creating tight coupling and operational fragility.

Mobile and IoT Backends#

Mobile clients operate under bandwidth, latency, and battery constraints that differ significantly from desktop browsers. An API gateway can aggregate multiple backend calls into a single response (the Backend-for-Frontend pattern), compress payloads, and adapt protocols. For IoT devices that may use MQTT or CoAP, the gateway translates between device protocols and internal HTTP/gRPC services.

Multi-Cloud and Hybrid Deployments#

Organizations running services across AWS, GCP, Azure, and on-premises data centers use an API gateway as the unified traffic layer. The gateway abstracts the underlying infrastructure, enabling consistent routing, security policies, and observability regardless of where a service is deployed. This is especially valuable during cloud migration, where services move between environments incrementally.

API Monetization#

Companies that expose APIs as products --- payment processors, data providers, communication platforms --- use gateways to enforce usage tiers, track consumption per API key, and generate billing data. Rate limiting by tier, quota enforcement, and detailed usage analytics are all gateway responsibilities in this model.

Zero-Trust Security#

A gateway enforces authentication and authorization at the network edge, ensuring that no unauthenticated request reaches backend services. Combined with mTLS for internal traffic, IP allowlisting, and WAF integration, the gateway becomes a core component of a zero-trust architecture. It can also mask or redact sensitive fields in responses to prevent data leakage.

Legacy System Modernization#

When migrating from monolithic to microservices architectures, an API gateway acts as the facade in the strangler fig pattern. New services are deployed behind the gateway alongside the legacy monolith. The gateway gradually shifts traffic from old endpoints to new ones, allowing incremental migration without disrupting existing clients.

Benefits of Using an API Gateway#

Simplified Client Integration#

Clients interact with a single, well-documented endpoint instead of tracking the addresses and protocols of individual services. This reduces client-side complexity, eliminates service discovery logic in front-end code, and makes API consumption predictable.

Centralized Security#

Authentication, authorization, encryption, and threat detection are enforced at one layer rather than reimplemented in every service. A single policy change at the gateway propagates instantly across all APIs. This consistency eliminates the security gaps that emerge when individual teams implement auth differently.

Operational Visibility#

Because every request passes through the gateway, teams gain comprehensive metrics, access logs, and distributed traces without instrumenting each service individually. Dashboards built on gateway telemetry provide real-time visibility into traffic patterns, error rates, and latency distributions across the entire API surface.

Reduced Backend Load#

Caching, request deduplication, and rate limiting at the gateway layer prevent unnecessary calls from reaching backend services. This directly reduces infrastructure costs and improves system stability during traffic spikes. For read-heavy APIs, gateway caching alone can cut upstream load by an order of magnitude.

Faster Time to Market#

Developers focus on business logic rather than reimplementing cross-cutting concerns. Adding authentication to a new service takes a single plugin configuration at the gateway instead of weeks of development. Teams ship faster because infrastructure concerns are already solved.

Independent Scalability#

The gateway and backend services scale independently. During a traffic surge, teams can horizontally scale the gateway layer without modifying any backend service. Conversely, backend services can be scaled, redeployed, or replaced without any client-facing changes.

How Apache APISIX Works as an API Gateway#

Apache APISIX is a high-performance, cloud-native API gateway built on Nginx and LuaJIT. It is designed for environments where throughput, latency, and extensibility are critical requirements.

Performance at scale. APISIX handles over 18,000 requests per second per CPU core with a median latency of 0.2ms. This performance comes from its non-blocking, event-driven architecture and the efficiency of LuaJIT-compiled plugin execution. For comparison, this throughput exceeds most Java- and Go-based gateways by a significant margin.

Extensive plugin ecosystem. APISIX ships with over 100 built-in plugins covering authentication (JWT, OAuth, LDAP, OpenID Connect), traffic control (rate limiting, circuit breaking, traffic mirroring), observability (Prometheus, SkyWalking, OpenTelemetry), and transformation (gRPC transcoding, request rewriting, response rewriting). Plugins can also be written in Lua, Java, Go, Python, or WebAssembly.

Dynamic configuration. Unlike traditional gateways that require restarts for configuration changes, APISIX reloads routes, upstreams, and plugin configurations in real time through its Admin API. This enables zero-downtime deployments and makes APISIX well-suited for CI/CD pipelines and GitOps workflows.

Proven adoption. APISIX powers over 147,000 deployments across more than 5,200 companies globally, spanning industries from fintech to telecommunications. Its Apache Software Foundation governance ensures vendor-neutral, community-driven development.

To get started with APISIX, see the getting started guide.

Frequently Asked Questions#

What is the difference between an API gateway and a load balancer?#

A load balancer distributes incoming traffic across multiple server instances using algorithms like round-robin or least connections. It operates at the network or transport layer (L4) or HTTP layer (L7) but does not understand API semantics. An API gateway performs load balancing as one of many functions, and adds API-specific capabilities: authentication, rate limiting, request transformation, caching, and observability. If you only need to distribute traffic, a load balancer suffices. If you need to manage, secure, and observe API traffic, you need a gateway.

Do I need an API gateway for a monolithic application?#

An API gateway is not strictly required for a monolith, but it can still add value. If your monolith exposes APIs consumed by external clients, mobile apps, or third-party integrators, a gateway provides centralized authentication, rate limiting, and monitoring without modifying the application. It also positions your architecture for incremental migration to microservices using the strangler fig pattern.

How does an API gateway affect latency?#

A well-implemented gateway adds minimal latency --- typically 0.2ms to 2ms per request depending on the number of active plugins. High-performance gateways like Apache APISIX are optimized for sub-millisecond overhead. The latency tradeoff is almost always worthwhile: the gateway eliminates redundant auth checks, reduces backend calls through caching, and prevents cascading failures through circuit breaking, all of which improve overall system response times.

Can an API gateway replace a service mesh?#

An API gateway and a service mesh serve different layers. The gateway handles north-south traffic (external clients to internal services), while a service mesh manages east-west traffic (service-to-service communication within the cluster). They are complementary, not competing, technologies. Some organizations use APISIX as both a gateway and an ingress controller, bridging the two layers, but a full service mesh (Istio, Linkerd) addresses concerns like mutual TLS between services and fine-grained internal traffic policies that fall outside a gateway's scope.

Is an API gateway the same as an API management platform?#

No. An API gateway is the runtime component that processes API traffic. An API management platform is a broader category that typically includes a gateway, a developer portal, API documentation tools, lifecycle management, and analytics dashboards. The gateway is the engine; the management platform is the full vehicle. Apache APISIX provides the high-performance gateway layer, and organizations often pair it with additional tooling for the complete API management lifecycle.

Related guides#

· One min read

gRPC is a high-performance, open-source remote procedure call (RPC) framework originally developed by Google. It uses Protocol Buffers for binary serialization and HTTP/2 for transport, enabling strongly typed service contracts, bidirectional streaming, and significantly smaller payload sizes compared to equivalent JSON over REST.

Why gRPC Exists#

REST has dominated API design for over fifteen years, and it remains an excellent choice for public-facing, resource-oriented APIs. However, as microservices architectures scaled into hundreds or thousands of inter-service calls per request, the limitations of REST became measurable: text-based JSON serialization consumes CPU cycles, HTTP/1.1 head-of-line blocking limits concurrency, and the lack of a formal contract language leads to integration drift.

Google developed gRPC internally (as Stubby) and open-sourced it in 2015. Adoption has grown steadily, and gRPC has become a common choice for latency-sensitive internal APIs in performance-critical systems.

How gRPC Works#

Protocol Buffers (Protobuf)#

Protocol Buffers are gRPC's interface definition language (IDL) and serialization format. A .proto file defines the service contract, including methods, request types, and response types:

syntax = "proto3";
service OrderService {  rpc GetOrder (OrderRequest) returns (OrderResponse);  rpc StreamOrders (OrderFilter) returns (stream OrderResponse);}
message OrderRequest {  string order_id = 1;}
message OrderResponse {  string order_id = 1;  string status = 2;  double total = 3;}

The protoc compiler generates client and server code in many languages from this single definition. Binary serialization produces payloads that are substantially smaller than equivalent JSON representations. This size reduction directly translates to lower network bandwidth consumption and faster serialization/deserialization.

HTTP/2 Transport#

gRPC runs exclusively on HTTP/2, which provides several performance advantages over HTTP/1.1:

  • Multiplexing. Multiple RPC calls share a single TCP connection without head-of-line blocking. A service making 50 concurrent calls to another service needs only one connection, not 50.
  • Header compression. HPACK compression significantly reduces header overhead for repeated headers.
  • Binary framing. HTTP/2 frames are binary, eliminating the text parsing overhead of HTTP/1.1.

These transport-level improvements compound with Protobuf serialization to deliver measurably lower latency in service-to-service communication.

Streaming Modes#

gRPC supports four communication patterns:

  1. Unary RPC. Single request, single response. Equivalent to a REST call.
  2. Server streaming. Client sends one request, server returns a stream of responses. Useful for real-time feeds or large result sets.
  3. Client streaming. Client sends a stream of messages, server returns one response. Useful for batched uploads or telemetry ingestion.
  4. Bidirectional streaming. Both sides send streams of messages concurrently. Useful for chat, collaborative editing, or real-time synchronization.

In practice, unary calls represent the majority of gRPC usage, with server streaming being the next most common pattern. The streaming capabilities differentiate gRPC from REST most sharply in real-time and high-throughput scenarios.

gRPC vs REST Comparison#

AspectgRPCREST
SerializationProtocol Buffers (binary)JSON (text)
TransportHTTP/2 onlyHTTP/1.1 or HTTP/2
Contract.proto file (strict)OpenAPI/Swagger (optional)
StreamingNative (4 modes)Limited (SSE, WebSocket)
Code GenerationBuilt-in (protoc)Third-party tools
Browser SupportRequires proxy (gRPC-Web)Native
Payload SizeSignificantly smallerBaseline
Latency (typical)Lower inter-serviceHigher inter-service
Human ReadabilityBinary (needs tooling)JSON is human-readable
CachingNot HTTP-cacheable by defaultHTTP caching built-in
Tooling MaturityGrowingExtensive

REST remains the dominant choice for public-facing APIs, while gRPC is increasingly preferred for internal microservices communication at larger organizations. The two protocols serve complementary roles rather than competing directly.

When to Use gRPC#

Use gRPC when:

  • Services communicate with high frequency and low latency requirements (trading systems, real-time analytics, game backends).
  • Payload efficiency matters because of bandwidth constraints or high message volumes.
  • Strong typing and contract-first development are priorities. The .proto file becomes the single source of truth.
  • Streaming is a core requirement (live data feeds, event-driven architectures, IoT telemetry).
  • Polyglot environments need consistent client/server code generation across multiple languages.

Stick with REST when:

  • The API is public-facing and must be browser-accessible without additional proxying.
  • Human readability and debuggability with standard HTTP tools (curl, Postman) are important for developer experience.
  • HTTP caching semantics are essential for performance.
  • The team's existing tooling and expertise are REST-centric, and migration cost outweighs the performance gain.

Many organizations adopting gRPC maintain REST for external APIs and use gRPC exclusively for internal communication, creating a dual-protocol architecture that leverages each protocol's strengths.

gRPC and API Gateways#

API gateways play a critical role in gRPC architectures by solving three problems: protocol translation, traffic management, and observability.

gRPC Proxying#

A gateway that natively supports HTTP/2 can proxy gRPC traffic directly, applying authentication, rate limiting, and logging without protocol translation. The gateway terminates the client's gRPC connection, applies policies, and forwards the call to the upstream gRPC service. This is the simplest integration model and preserves full gRPC semantics including streaming.

gRPC-Web Translation#

Browsers cannot make native gRPC calls because browser-based JavaScript does not expose the HTTP/2 framing layer required by gRPC. The gRPC-Web protocol bridges this gap: the browser sends gRPC-Web requests (HTTP/1.1 or HTTP/2 with modified framing), and the gateway translates them into native gRPC for the upstream service. This eliminates the need for a separate REST API layer for browser clients.

HTTP/JSON to gRPC Transcoding#

Many organizations need to expose gRPC services to clients that can only consume REST/JSON. An API gateway with transcoding capabilities automatically maps HTTP verbs and JSON payloads to gRPC methods and Protobuf messages based on annotations in the .proto file. This enables a single gRPC backend to serve both gRPC and REST clients without maintaining two codebases.

In practice, gRPC deployments behind an API gateway typically use a mix of pure gRPC proxying, gRPC-Web for browser access, and transcoding to serve REST clients.

How Apache APISIX Handles gRPC#

Apache APISIX provides native gRPC support across all three integration patterns described above.

Native gRPC Proxying#

APISIX proxies gRPC traffic natively over HTTP/2, supporting unary and streaming calls. Routes can be configured with gRPC-specific upstream settings, and the full plugin ecosystem applies to gRPC routes: authentication (JWT, key-auth), rate limiting, circuit breaking, and observability all work transparently on gRPC traffic.

gRPC-Web Support#

The grpc-web plugin enables browser clients to communicate with gRPC backends through APISIX. The plugin handles the protocol translation between gRPC-Web and native gRPC, allowing frontend teams to consume gRPC services directly without building a REST translation layer. This reduces the API surface area and eliminates a class of contract synchronization bugs.

HTTP/JSON to gRPC Transcoding#

The grpc-transcode plugin maps REST endpoints to gRPC methods using the Protobuf descriptor. After uploading the .proto file to APISIX, the plugin automatically exposes each gRPC method as an HTTP endpoint, translating JSON request bodies to Protobuf messages and Protobuf responses back to JSON. This is particularly valuable for organizations migrating from REST to gRPC incrementally, as existing REST clients continue working while backends are rewritten.

APISIX's gRPC performance is notable: internal benchmarks show gRPC proxying at approximately 15,000 RPS per CPU core with 0.3 milliseconds of added latency, comparable to its HTTP/1.1 proxying performance. The getting started guide includes gRPC configuration examples.

gRPC Best Practices#

  1. Version your .proto files carefully. Protobuf supports backward-compatible field additions, but removing or renaming fields breaks clients. Use reserved field numbers for deleted fields.

  2. Set deadlines on every RPC. Without a deadline, a hung upstream can hold client resources indefinitely. Missing or overly generous RPC deadlines are a common cause of cascading failures in distributed systems.

  3. Use load balancing at the connection level. Because HTTP/2 multiplexes many RPCs over one connection, TCP-level load balancing (L4) is insufficient. Use L7 load balancing or client-side balancing to distribute RPCs across backend instances.

  4. Implement health checking. gRPC defines a standard health checking protocol (grpc.health.v1.Health). Use it for readiness probes and load balancer health checks.

  5. Monitor per-method metrics. Track latency, error rate, and throughput per gRPC method, not just per service. A slow GetOrder method is invisible if aggregated with a fast ListOrders method.

FAQ#

Can gRPC completely replace REST?#

Not in most architectures. gRPC excels at internal service-to-service communication where performance, type safety, and streaming matter. REST remains superior for public APIs due to native browser support, human-readable payloads, HTTP caching, and broader tooling familiarity. The most common pattern is gRPC internally with REST or GraphQL at the edge, using an API gateway for protocol translation.

How do I debug gRPC calls if the payloads are binary?#

Tools like grpcurl (a curl equivalent for gRPC), Postman (which added gRPC support in 2023), and BloomRPC provide human-readable interaction with gRPC services. For production debugging, structured logging at the gateway layer that decodes Protobuf messages into JSON is the most effective approach. APISIX's logging plugins can capture gRPC request and response metadata for observability.

What is the performance difference between gRPC and REST in practice?#

In controlled benchmarks, gRPC typically delivers significantly higher throughput and lower latency than REST/JSON for equivalent workloads. The gains come from binary serialization (smaller payloads, faster encoding), HTTP/2 multiplexing (fewer connections, no head-of-line blocking), and code-generated clients (no reflection or manual parsing). The exact improvement depends on payload size, call frequency, and network conditions. Organizations migrating from REST to gRPC commonly report meaningful reductions in inter-service latency in production.

Does gRPC work with WebAssembly or edge computing?#

Yes. Protobuf serialization libraries exist for languages targeting WebAssembly, and gRPC-Web enables browser-based Wasm applications to call gRPC backends. For edge computing, gRPC's compact payloads and efficient serialization are advantageous on bandwidth-constrained links. Several CDN providers, including Cloudflare and Fastly, now support gRPC proxying at the edge as of 2025.

Related#

· One min read

Mutual TLS (mTLS) is a security protocol where both the client and server authenticate each other using X.509 certificates during the TLS handshake. Unlike standard TLS, which only verifies the server's identity, mTLS ensures that both parties prove they are who they claim to be before any application data is exchanged.

Why Mutual TLS Matters#

Standard TLS protects the vast majority of internet traffic today. The overwhelming majority of web traffic now uses HTTPS. However, standard TLS only solves half the authentication problem: clients verify that the server holds a valid certificate, but servers have no cryptographic assurance about the client's identity. They rely on application-layer mechanisms like API keys, tokens, or passwords instead.

This gap becomes critical in zero-trust architectures, service-to-service communication, and regulated environments where network-level identity verification is required. mTLS closes this gap by making identity verification bilateral and cryptographic.

mTLS vs Standard TLS#

AspectStandard TLSMutual TLS (mTLS)
Server authenticatedYesYes
Client authenticatedNo (application layer)Yes (certificate)
Client certificate requiredNoYes
Certificate management complexityLowHigh
Typical use casePublic websites, APIsInternal services, zero-trust, IoT
Identity assurance levelServer onlyBoth endpoints
Performance overheadBaseline~5-10% additional handshake time
Common in browsersYesRare (except enterprise)

mTLS has become the predominant service-to-service authentication mechanism in zero-trust network access (ZTNA) implementations, reflecting growing recognition that network perimeter-based security is insufficient for distributed architectures.

How the mTLS Handshake Works#

The mTLS handshake extends the standard TLS 1.3 handshake with additional steps for client certificate exchange. Here is the full sequence:

Step 1: Client Hello. The client initiates the connection by sending supported cipher suites, TLS version, and a random value to the server. This step is identical to standard TLS.

Step 2: Server Hello and Server Certificate. The server responds with its chosen cipher suite, its own random value, and its X.509 certificate. The server also sends a CertificateRequest message, signaling that the client must present a certificate. In standard TLS, this CertificateRequest is absent.

Step 3: Client Verifies Server Certificate. The client validates the server's certificate against its trust store, checking the certificate chain, expiration, revocation status (via CRL or OCSP), and that the subject matches the expected server identity.

Step 4: Client Certificate Submission. The client sends its own X.509 certificate to the server along with a CertificateVerify message containing a digital signature over the handshake transcript, proving possession of the private key corresponding to the certificate.

Step 5: Server Verifies Client Certificate. The server validates the client certificate against its configured Certificate Authority (CA) trust store, checks the certificate chain, verifies the CertificateVerify signature, and optionally checks revocation status. If verification fails, the server terminates the connection immediately.

Step 6: Secure Channel Established. Both parties derive session keys from the shared secret. All subsequent communication is encrypted and authenticated in both directions.

The entire handshake adds approximately 1-2 milliseconds of latency compared to standard TLS, depending on certificate chain depth and revocation checking methods.

Use Cases for Mutual TLS#

Zero-Trust Architecture#

Zero-trust security models operate on the principle of "never trust, always verify." Every service must authenticate cryptographically before communicating, regardless of network location. mTLS provides the transport-layer foundation for this model. The industry trend is strongly toward zero-trust for new network access deployments, with mTLS as the predominant service identity mechanism.

Microservices Communication#

In microservices architectures, dozens or hundreds of services communicate over internal networks. Without mTLS, a compromised service can impersonate any other service on the network. mTLS ensures that Service A can only communicate with Service B if both hold certificates signed by a trusted CA. Service meshes like Istio and Linkerd automate mTLS certificate issuance and rotation for every service pod, making deployment tractable at scale.

IoT Device Authentication#

IoT devices operate in physically untrusted environments where API keys or passwords can be extracted from device firmware. mTLS binds device identity to a hardware-backed certificate, making impersonation significantly harder. Certificate-based authentication is widely adopted across IoT devices, with mTLS adoption growing rapidly in industrial and healthcare IoT deployments.

API Security and Partner Integration#

APIs exposed to partners or regulated industries often require stronger authentication than API keys provide. mTLS ensures that only clients holding a certificate issued by the API provider's CA can establish a connection, providing defense-in-depth before any application-layer authentication occurs. Financial services APIs governed by Open Banking regulations in the EU, UK, and Australia mandate mTLS for third-party provider connections.

Challenges of Implementing mTLS#

Certificate Lifecycle Management#

Every client and server in an mTLS deployment needs a valid certificate. For an organization running 500 microservices with 3 replicas each, that means managing 1,500 certificates with their own issuance, renewal, and revocation cycles. Without automation, this becomes operationally unsustainable. Tools like cert-manager (for Kubernetes), HashiCorp Vault, and SPIFFE/SPIRE address this by automating certificate lifecycle operations.

Certificate-related outages are common in organizations managing large certificate inventories, and remediation can be costly. Automated rotation is not optional for production mTLS deployments.

Certificate Rotation#

Short-lived certificates (hours or days) reduce the blast radius of a compromised key but increase rotation frequency. Long-lived certificates (months or years) reduce operational churn but increase exposure time if compromised. The industry trend moves toward short-lived certificates: SPIFFE recommends certificate lifetimes of 1 hour for workload identities, with automated rotation handled by the SPIRE agent.

Performance Considerations#

mTLS adds computational overhead from asymmetric cryptography during the handshake and certificate validation. For services handling thousands of new connections per second, this overhead can be measurable. Connection pooling and keep-alive headers amortize the handshake cost across many requests. TLS session resumption (via session tickets or pre-shared keys) eliminates the full handshake on reconnection, reducing the per-request cost to near zero for long-lived connections.

Debugging and Observability#

When mTLS connections fail, diagnosing the cause is harder than debugging standard TLS failures. Common failure modes include expired certificates, CA trust store mismatches, certificate revocation, and clock skew between endpoints. Structured logging of TLS handshake events, certificate serial numbers, and validation errors is essential for operational mTLS deployments.

How to Configure mTLS in Apache APISIX#

Apache APISIX supports mTLS at both the edge (between clients and APISIX) and internally (between APISIX and upstream services). The configuration uses APISIX's SSL resource and route-level settings.

Client-to-Gateway mTLS#

To require client certificates for incoming connections, configure an SSL resource with the CA certificate that should be trusted for client authentication. APISIX will reject any client that does not present a certificate signed by the specified CA. See the mTLS documentation for the full SSL resource schema and configuration examples.

Gateway-to-Upstream mTLS#

When upstream services require mTLS, configure the upstream resource with the client certificate and key that APISIX should present. This ensures APISIX authenticates itself to backend services, maintaining the zero-trust chain from edge to origin. The upstream TLS configuration section covers the required fields.

Per-Route mTLS Policies#

APISIX allows different mTLS policies per route, enabling gradual rollout. Internal admin APIs can require mTLS immediately while public-facing routes continue using standard TLS with application-layer authentication. This granularity is configured through the route's ssl and upstream settings.

The certificate management guide covers integration with cert-manager and external CA providers for automated certificate rotation within APISIX deployments.

mTLS Best Practices#

  1. Automate certificate lifecycle. Never rely on manual certificate issuance or renewal for production mTLS. Use cert-manager, Vault, or SPIRE.

  2. Use short-lived certificates. Target lifetimes of 24 hours or less for workload certificates. Rotate automatically before expiration.

  3. Separate CAs by trust domain. Do not use the same CA for internal service certificates and external partner certificates. Maintain distinct trust hierarchies.

  4. Monitor certificate expiration. Set alerting thresholds at 7 days, 3 days, and 1 day before expiration. Track certificate inventory centrally.

  5. Enable OCSP stapling. Reduce certificate validation latency by stapling OCSP responses at the server rather than requiring clients to contact the CA's OCSP responder.

FAQ#

What happens if a client certificate expires during an active mTLS connection?#

Existing connections continue functioning until they are closed because TLS authentication occurs during the handshake, not continuously. However, any new connection attempt with the expired certificate will fail. This is why short-lived certificates combined with connection draining during rotation are important: they ensure that stale credentials are phased out promptly without disrupting in-flight requests.

Is mTLS the same as two-way SSL?#

Yes. "Two-way SSL," "mutual SSL," and "mutual TLS" all describe the same mechanism: both endpoints present and verify certificates. The terminology "mutual TLS" is preferred in modern usage because TLS superseded SSL over two decades ago, and all current implementations use TLS 1.2 or TLS 1.3 rather than any SSL version.

Does mTLS replace the need for API keys or OAuth tokens?#

No. mTLS authenticates the transport-layer identity (which machine or service is connecting), while API keys and OAuth tokens authenticate the application-layer identity (which user, application, or tenant is making the request). In a defense-in-depth strategy, mTLS and application-layer authentication serve complementary roles. mTLS ensures only authorized services can reach the endpoint; tokens and keys determine what those services are allowed to do.

How does mTLS perform at scale in Kubernetes?#

In Kubernetes environments with service meshes, mTLS scales well because certificate issuance and rotation are fully automated by the mesh control plane. Istio, for example, issues and rotates certificates for every pod automatically using its built-in CA. The performance impact is primarily on new connections (the handshake), which is amortized by connection pooling. Organizations running mTLS across 10,000+ pods report negligible steady-state performance impact, with the main operational cost being control plane resource consumption for certificate management.

Related#