Skip to main content

9 posts tagged with "api-gateway"

View All Tags

· 8 min read

API gateway authentication is the practice of verifying client identity at a centralized entry point before requests reach backend services. By enforcing authentication at the gateway layer, organizations eliminate redundant auth logic across services, reduce attack surface, and gain a single enforcement point for access policies.

What is API Gateway Authentication#

In a distributed architecture, every service that exposes an endpoint must answer a fundamental question: who is making this request? Without a gateway, each service independently implements its own authentication stack. This leads to inconsistent enforcement, duplicated code, and a broader attack surface.

An API gateway centralizes this concern. It intercepts every inbound request, validates credentials against a configured identity provider or local store, and either forwards the authenticated request downstream or rejects it immediately. Broken authentication consistently ranks among the top API vulnerability categories, making centralized enforcement critical.

Centralizing authentication at the gateway layer provides three key advantages. First, it significantly reduces per-service authentication code by consolidating auth logic into a single component. Second, it creates a single audit log for every authentication event. Third, it enables credential rotation and policy changes without redeploying individual services.

Authentication Methods#

Key Auth#

Key authentication is the simplest method. The client includes a static API key in a header or query parameter. The gateway validates the key against a stored registry and maps it to a consumer identity.

Key Auth works well for server-to-server communication where transport security (TLS) is guaranteed and the client population is small. API keys remain common for machine-to-machine authentication, though their share is declining as organizations move toward token-based methods.

Apache APISIX supports Key Auth natively through its key-auth plugin. Configuration requires only defining a consumer and attaching the plugin to a route.

JWT (JSON Web Tokens)#

JWT authentication uses digitally signed tokens that carry claims about the client. The gateway validates the token signature, checks expiration, and optionally verifies audience and issuer claims. Because JWTs are self-contained, the gateway does not need to call an external service on every request.

JWTs dominate modern API authentication. The compact format and stateless verification make JWTs particularly well-suited for high-throughput gateways where microsecond-level latency matters.

APISIX implements JWT validation through its jwt-auth plugin, supporting both HS256 and RS256 algorithms with configurable claim validation.

OAuth 2.0#

OAuth 2.0 is an authorization framework that enables third-party applications to obtain limited access to an API on behalf of a resource owner. The gateway validates bearer tokens issued by an authorization server, typically by introspecting the token or verifying a JWT access token locally.

OAuth 2.0 is widely adopted across enterprises for API integrations. The framework's delegation model makes it essential for any API exposed to external developers or partner ecosystems.

OpenID Connect (OIDC)#

OpenID Connect extends OAuth 2.0 with a standardized identity layer. It adds an ID token (a JWT) that carries user identity claims alongside the OAuth 2.0 access token. The gateway can validate the ID token to confirm user identity and use the access token for authorization decisions.

OIDC is the de facto standard for single sign-on in API ecosystems. Major identity providers including Okta, Auth0, Azure AD, and Google Identity all implement OIDC. APISIX provides native OIDC support through its openid-connect plugin, which handles the full authorization code flow, token introspection, and token refresh.

mTLS (Mutual TLS)#

Mutual TLS requires both the client and server to present certificates during the TLS handshake. The gateway validates the client certificate against a trusted certificate authority, establishing strong machine identity without application-layer tokens.

mTLS adoption has surged alongside zero-trust architecture initiatives. In Kubernetes environments, mTLS between services has become increasingly common. At the gateway level, mTLS is particularly valuable for B2B integrations and internal service-to-service communication where certificate management infrastructure already exists.

HMAC Authentication#

HMAC authentication requires the client to compute a hash-based message authentication code over the request content using a shared secret. The gateway independently computes the same HMAC and compares the results. This method provides request integrity verification in addition to authentication.

HMAC is common in financial APIs and webhook verification scenarios where request tampering must be detected. AWS Signature Version 4, used across all AWS API calls, is an HMAC-based scheme processing billions of requests daily.

Comparison Table#

MethodComplexityStatefulnessBest ForToken Expiry
Key AuthLowStateless (lookup)Internal services, simple integrationsManual rotation
JWTMediumStatelessHigh-throughput APIs, mobile clientsBuilt-in (exp claim)
OAuth 2.0HighStateful (auth server)Third-party access, delegated authAccess token TTL
OIDCHighStateful (identity provider)SSO, user-facing APIsID + access token TTL
mTLSHighStateless (cert validation)Zero-trust, B2B, service meshCertificate validity period
HMACMediumStatelessFinancial APIs, webhook verificationPer-key rotation policy

Best Practices#

Layer your authentication. Use mTLS at the transport layer for service identity and JWT or OAuth 2.0 at the application layer for user identity. Defense in depth reduces the impact of any single credential compromise.

Enforce short-lived tokens. Set JWT and OAuth 2.0 access token lifetimes to 15 minutes or less for user-facing flows. Use refresh tokens to obtain new access tokens without re-authentication. Short token lifetimes limit the window of exploitation if a token is leaked.

Centralize consumer management. Define consumers at the gateway level with consistent identity attributes. Map every API key, JWT subject, and OAuth 2.0 client ID to a named consumer entity. This enables unified rate limiting, logging, and access control across authentication methods.

Validate all claims. Do not trust a JWT solely because its signature is valid. Verify the issuer (iss), audience (aud), expiration (exp), and not-before (nbf) claims. Reject tokens with unexpected or missing claims.

Log authentication events comprehensively. Record every authentication success and failure with client identity, timestamp, source IP, and the route accessed. These logs are essential for incident response and compliance audits. NIST SP 800-92 recommends retaining authentication logs for a minimum of 90 days.

How Apache APISIX Handles Authentication#

Apache APISIX provides a plugin-based authentication architecture that supports all six methods described above. Each authentication plugin runs in the gateway's request processing pipeline before the request reaches any upstream service.

APISIX's consumer abstraction ties authentication credentials to named entities. A single consumer can have multiple authentication methods attached, enabling gradual migration between methods. For example, an organization migrating from Key Auth to JWT can configure both plugins on the same consumer during the transition period.

Key plugins include:

  • key-auth: Static API key validation with header or query parameter extraction.
  • jwt-auth: JWT signature verification with configurable algorithms and claim validation.
  • openid-connect: Full OIDC flow support including authorization code, token introspection, and PKCE.

APISIX also supports chaining authentication plugins with authorization plugins such as consumer-restriction and OPA (Open Policy Agent), enabling fine-grained access control decisions after identity is established.

Performance benchmarks show APISIX processing authenticated requests with sub-millisecond overhead for Key Auth and JWT validation, and under 5ms for OIDC token introspection with a local identity provider. These numbers hold at sustained loads exceeding 10,000 requests per second on modest hardware.

FAQ#

Should I use JWT or OAuth 2.0 for my API?#

JWT and OAuth 2.0 are not mutually exclusive. OAuth 2.0 is an authorization framework that often uses JWTs as its access token format. If your API serves first-party clients only, standalone JWT authentication may suffice. If third-party developers need delegated access, implement the full OAuth 2.0 framework with JWT access tokens.

Is API key authentication secure enough for production?#

API key authentication is secure for server-to-server communication over TLS when keys are rotated regularly and scoped to specific consumers. It is not recommended for client-side applications (browsers, mobile apps) because keys cannot be kept secret on end-user devices. For any client-facing API, prefer OAuth 2.0 or OIDC.

How does mTLS differ from standard TLS at the gateway?#

Standard TLS authenticates only the server to the client. The client verifies the server's certificate, but the server accepts any client connection. mTLS adds a second handshake step where the client also presents a certificate that the server validates against a trusted CA. This provides strong machine identity for both parties and is a foundational component of zero-trust network architectures.

Can I combine multiple authentication methods on a single route?#

Yes. Apache APISIX supports configuring multiple authentication plugins on a single route. The gateway attempts each configured method in order and accepts the request if any method succeeds. This is useful during migration periods or when a route serves clients with different authentication capabilities.

· 9 min read

Microservices architectures need an API gateway to provide a single entry point that abstracts the complexity of a distributed service fleet from API consumers. The gateway handles cross-cutting concerns like authentication, routing, rate limiting, and observability centrally, preventing each microservice from reimplementing these capabilities independently and ensuring consistent behavior across the entire API surface.

Why Microservices Need a Gateway#

A microservices architecture decomposes a monolithic application into independently deployable services, each owning a specific business domain. While this approach improves development velocity and scaling flexibility, it introduces operational challenges that compound as the number of services grows.

Without a gateway, every client must know the network location of every service it needs. A single mobile application screen might require data from five different services, forcing the client to manage multiple connections, handle partial failures, and aggregate responses. As organizations scale their microservices fleets, client-side orchestration becomes impractical.

The API gateway pattern, first described by Chris Richardson and widely adopted since, solves this by interposing a single component between clients and the service fleet. The gateway accepts all client requests, routes them to the appropriate services, and returns consolidated responses. The pattern has become a standard component in production microservices architectures.

The gateway also addresses a second fundamental problem: cross-cutting concern duplication. Authentication, logging, rate limiting, CORS handling, and request validation must be applied consistently across all services. Without a gateway, each service team implements these independently, leading to drift, inconsistency, and duplicated effort. Centralizing cross-cutting concerns at the gateway significantly reduces per-service boilerplate code.

Core Gateway Patterns#

Request Routing#

The most fundamental gateway pattern routes incoming requests to the correct upstream service based on URL paths, headers, methods, or other request attributes. A gateway might route /api/users/* to the user service, /api/orders/* to the order service, and /api/products/* to the catalog service.

Dynamic routing takes this further by reading route configurations from a control plane or configuration store, allowing routes to be updated without restarting the gateway. Apache APISIX supports dynamic route configuration through its Admin API and etcd-backed configuration store, enabling zero-downtime route changes.

API Composition (Aggregation)#

API composition combines responses from multiple microservices into a single response for the client. Instead of requiring a mobile application to make five separate API calls to render a dashboard, the gateway fetches data from all five services in parallel and returns a unified response.

This pattern reduces client-side complexity and network round trips. Consolidating multiple API calls into a single gateway request significantly decreases page load time, especially on mobile networks where each round trip adds noticeable latency.

Backend for Frontend (BFF)#

The BFF pattern creates gateway configurations tailored to specific client types. A mobile BFF provides compact responses optimized for bandwidth constraints and small screens. A web BFF returns richer data structures suited to desktop layouts. An internal BFF serves admin tools with elevated access.

Each BFF acts as a specialized gateway layer that transforms and filters upstream service responses for its target client. Netflix, Spotify, and SoundCloud have publicly documented their BFF implementations. The pattern prevents a one-size-fits-all API from forcing compromises on every client type.

Service Mesh Integration#

In architectures that deploy both an API gateway and a service mesh, the gateway handles north-south traffic (client to cluster) while the service mesh manages east-west traffic (service to service). The gateway provides external-facing features like API key authentication, rate limiting, and response transformation. The mesh handles internal concerns like mTLS, circuit breaking, and service-to-service load balancing.

Most organizations using a service mesh also deploy an API gateway with clear boundaries between the two components. This separation avoids the complexity of running a full mesh for external traffic while preserving mesh benefits for internal communication.

Key Features for Microservices#

Service Discovery#

In a microservices environment, services scale dynamically and their network locations change frequently. Static configuration of upstream addresses becomes impractical at scale. Service discovery enables the gateway to automatically detect available service instances and their health status.

Apache APISIX integrates with multiple service discovery systems, documented in its discovery configuration guide. Supported registries include Consul, Eureka, Nacos, and Kubernetes-native service discovery. When a new service instance registers or an existing instance becomes unhealthy, APISIX updates its routing table automatically.

In Kubernetes environments, APISIX can read Service and Endpoint resources directly, eliminating the need for a separate discovery system. Kubernetes-native service discovery typically provides faster routing updates compared to polling-based approaches.

Circuit Breaking#

Circuit breaking prevents cascading failures by stopping requests to an unhealthy upstream service. When error rates exceed a configured threshold, the circuit opens and the gateway returns a fast failure response instead of forwarding requests to the struggling service. After a cooldown period, the circuit enters a half-open state and allows a limited number of test requests through. If those succeed, the circuit closes and normal traffic resumes.

Without circuit breaking, a single unhealthy service can consume all available connections and thread pools in the gateway, causing failures to cascade across the entire system. Organizations using circuit breakers typically experience significantly shorter outage durations.

Canary Deployment#

Canary deployment routes a small percentage of production traffic to a new service version while the majority continues to hit the stable version. The gateway controls the traffic split, enabling teams to validate new releases with real traffic before committing to a full rollout.

APISIX supports traffic splitting through weighted upstream configurations. A typical canary deployment starts with 5% of traffic routed to the new version, gradually increasing to 25%, 50%, and finally 100% as metrics confirm stability. If errors spike, the gateway reverts the traffic split without any service redeployment.

Distributed Tracing#

In a microservices architecture, a single client request might traverse ten or more services. Distributed tracing tracks the request's path through the entire service chain, recording latency at each hop. The gateway plays a critical role by injecting trace context headers (W3C Trace Context or B3) into every request it forwards.

APISIX supports trace context propagation to observability backends including Zipkin, Jaeger, SkyWalking, and OpenTelemetry. With tracing enabled at the gateway, operations teams gain end-to-end visibility into request flows, enabling faster incident resolution compared to relying solely on logs and metrics.

API Gateway vs Service Mesh#

API gateways and service meshes both manage network traffic in a microservices architecture, but they target different communication patterns and offer different feature sets.

AspectAPI GatewayService Mesh
Traffic directionNorth-south (external to internal)East-west (internal to internal)
Deployment modelCentralized proxyDistributed sidecar proxies
Primary focusAPI management, external securityInternal networking, observability
AuthenticationAPI keys, JWT, OAuth, OIDCmTLS (identity-based)
Rate limitingPer-consumer, per-routePer-service (less granular)
Protocol supportHTTP, gRPC, WebSocket, GraphQLTCP, HTTP, gRPC
Request transformationYesTypically no

The two technologies are complementary, not competitive. Organizations deploying both an API gateway and a service mesh generally report improved overall system reliability compared to using either component alone.

How Apache APISIX Supports Microservices#

Apache APISIX is designed for microservices environments, offering dynamic configuration, multi-protocol support, and native integration with cloud-native infrastructure.

Dynamic configuration without restarts. APISIX stores configuration in etcd and applies changes in real time. Routes, upstream definitions, plugins, and consumers can be added, modified, or removed through the Admin API without restarting any gateway node. This is essential in microservices environments where service endpoints change frequently.

Plugin pipeline architecture. APISIX's plugin system runs a configurable pipeline of plugins on each request. For microservices, this means authentication, rate limiting, request transformation, and logging execute as independent pipeline stages. Plugins can be enabled per-route, per-service, or globally, providing fine-grained control over cross-cutting behavior.

Kubernetes-native operation. The APISIX Ingress Controller deploys APISIX as a Kubernetes-native gateway, supporting both the legacy Ingress resource and the newer Gateway API specification. This allows platform teams to manage gateway configuration using familiar Kubernetes declarative workflows.

Service discovery integration. APISIX's service discovery capabilities connect directly to Consul, Nacos, Eureka, and Kubernetes DNS, ensuring the gateway always routes to healthy, available service instances without manual configuration updates.

Observability. Built-in plugins export metrics to Prometheus, traces to Jaeger, Zipkin, and OpenTelemetry, and logs to HTTP endpoints, syslog, or Kafka. This enables a complete observability pipeline with the gateway as the instrumentation point for all external API traffic.

FAQ#

Do I need an API gateway if I already use a service mesh?#

Yes. A service mesh manages internal service-to-service communication but does not address external API concerns like consumer authentication, API key management, rate limiting per consumer, request transformation, or developer-facing documentation. The API gateway handles the north-south boundary where external clients interact with your microservices. Deploy both for comprehensive traffic management.

How does an API gateway handle partial failures across microservices?#

An API gateway can be configured with circuit breakers, timeouts, and retry policies for each upstream service. When using the API composition pattern, the gateway can return partial results with degraded status rather than failing the entire request when one backend is unavailable. APISIX supports configurable timeouts and retry counts per route, and health checks automatically remove unhealthy nodes from the upstream pool.

Should each microservice team manage their own gateway routes?#

A decentralized model where service teams own their route configurations works well at scale, provided the platform team controls the global policies (authentication requirements, rate limiting defaults, logging standards). APISIX supports this through its Admin API, which can be integrated into CI/CD pipelines. Service teams declare their routes in version-controlled configuration files, and the deployment pipeline applies changes through the Admin API after policy validation.

What is the performance impact of adding an API gateway to a microservices architecture?#

Apache APISIX, built on NGINX and LuaJIT, adds 1-2ms of latency per request with a typical plugin configuration (authentication, rate limiting, logging). For most microservices architectures where end-to-end request latency is 50-500ms, this overhead is under 2% of total latency. The operational benefits of centralized security, observability, and traffic management significantly outweigh the minor latency cost.

· 9 min read

API gateway rate limiting is the practice of controlling how many requests a client can make to your API within a defined time window. Implemented at the gateway layer, rate limiting protects backend services from overload, prevents abuse, ensures fair resource allocation across consumers, and maintains predictable service quality under variable traffic conditions.

What is Rate Limiting#

Rate limiting enforces a maximum request throughput for API consumers. When a client exceeds its allowed quota, the gateway returns an HTTP 429 (Too Many Requests) response instead of forwarding the request to the upstream service. The response typically includes a Retry-After header indicating when the client can resume making requests.

The need for rate limiting has grown alongside API traffic volumes. API traffic now represents the majority of HTTP requests processed globally, and a significant portion consists of automated requests, many of which are abusive or unintentional high-frequency polling.

Without rate limiting, a single misbehaving client can consume disproportionate backend resources, degrading performance for all consumers. Rate limiting is also a contractual tool: it enforces the usage tiers defined in API monetization plans and SLAs.

Why Rate Limit at the Gateway#

Implementing rate limiting at the API gateway rather than in individual services provides several structural advantages.

Single enforcement point. When rate limits are defined at the gateway, every request passes through the same throttling logic regardless of which upstream service handles it. This eliminates the risk of inconsistent enforcement across a microservices fleet and reduces availability incidents caused by traffic spikes.

Reduced backend load. Rejected requests never reach the upstream service. This means the gateway absorbs the cost of excess traffic, keeping backend services operating within their designed capacity.

Consistent client experience. Centralized rate limiting ensures all consumers receive the same HTTP 429 responses with standardized headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset), making it straightforward for client developers to implement backoff logic.

Operational visibility. Gateway-level rate limiting produces unified metrics on throttled requests, enabling operations teams to identify abusive clients, undersized quotas, and traffic anomalies from a single dashboard.

Rate Limiting Algorithms#

Token Bucket#

The token bucket algorithm maintains a bucket of tokens for each rate-limited entity. Tokens are added at a fixed rate up to a maximum capacity. Each request consumes one token. If the bucket is empty, the request is rejected.

Token bucket allows short bursts up to the bucket capacity while enforcing an average rate over time. This makes it well-suited for APIs where occasional traffic spikes are acceptable but sustained overuse is not.

Pros: Permits controlled bursting, simple to implement, low memory footprint.

Cons: Burst size must be tuned carefully; overly generous bursts can still overwhelm backends.

Leaky Bucket#

The leaky bucket algorithm processes requests at a fixed rate, queuing excess requests until the queue is full. It smooths traffic into a uniform output rate regardless of input burstiness.

Leaky bucket is ideal for backends that require strictly uniform request rates, such as third-party APIs with their own rate limits or services with fixed connection pools.

Pros: Produces perfectly smooth output, prevents backend overload from bursts.

Cons: Higher latency for bursty traffic due to queuing, queue size requires tuning.

Sliding Window#

The sliding window algorithm divides time into overlapping windows and counts requests across the current and previous windows using weighted proportions. This eliminates the boundary problem inherent in fixed windows.

For example, if the window is 60 seconds and the current position is 40 seconds into the window, the algorithm weights 33% of the previous window's count and 100% of the current window's count to determine if the limit is exceeded.

Pros: Accurate rate enforcement without boundary spikes, reasonable memory usage.

Cons: Slightly more complex to implement than fixed window.

Fixed Window#

The fixed window algorithm divides time into non-overlapping intervals and counts requests within each interval. When the count exceeds the limit, subsequent requests are rejected until the next window begins.

Fixed window is the simplest algorithm but has a well-known boundary problem: a client can make double the intended rate by clustering requests at the end of one window and the beginning of the next. Despite this limitation, fixed window remains widely deployed due to its simplicity and low overhead.

Pros: Minimal memory and computation, easy to understand and debug.

Cons: Boundary burst problem allows temporary rate doubling.

Algorithm Comparison#

AlgorithmBurst HandlingOutput SmoothnessMemoryComplexityBoundary Accuracy
Token BucketAllows controlled burstsModerateLowLowN/A
Leaky BucketQueues burstsVery smoothMediumLowN/A
Sliding WindowProportional smoothingSmoothMediumMediumHigh
Fixed WindowBoundary bursts possibleLowVery lowVery lowLow

Rate Limiting Strategies#

Per-Consumer#

Assign rate limits based on authenticated consumer identity. This is the most common strategy for APIs with tiered pricing plans. A free tier consumer might receive 100 requests per minute while a paid enterprise consumer receives 10,000.

Per-consumer rate limiting requires the rate limiting plugin to execute after authentication so the consumer identity is available. APISIX's consumer abstraction makes this straightforward: attach rate limit configurations directly to consumer definitions.

Per-IP#

Throttle requests based on the client's source IP address. This strategy is effective for public APIs that do not require authentication, such as health check endpoints or public data feeds. IP-based rate limiting is a practical first line of defense against volumetric API abuse, especially when combined with reputation scoring.

Per-IP limiting has limitations in environments where many clients share a single IP (corporate NATs, mobile carriers). Use it as a coarse first defense layer, not as the sole rate limiting strategy.

Per-Route#

Apply different rate limits to different API endpoints based on their resource cost. A search endpoint that triggers expensive database queries might have a stricter limit than a simple metadata lookup. This strategy protects the most resource-intensive parts of your backend.

Global#

Enforce an aggregate rate limit across all consumers and routes. Global limits protect the overall system capacity and are typically set well above individual consumer limits. They serve as a safety net when the sum of individual limits exceeds actual infrastructure capacity.

How Apache APISIX Implements Rate Limiting#

Apache APISIX provides three complementary rate limiting plugins, each targeting a different dimension of traffic control.

limit-req (Request Rate Limiting)#

The limit-req plugin implements a leaky bucket algorithm that controls the request rate per second. It accepts configuration for the sustained request rate (rate), the burst allowance (burst), and the rejection status code.

This plugin is ideal when you need to smooth traffic to a uniform rate. It supports keying on remote address, consumer name, service, or any variable available in the APISIX context.

limit-count (Request Count Limiting)#

The limit-count plugin enforces a maximum number of requests within a configurable time window. It supports both fixed window and sliding window algorithms, with the window size configurable from one second to one day.

limit-count is the best choice for implementing API quota plans (e.g., 10,000 requests per day). It returns standard rate limit headers so clients can track their remaining quota. For distributed deployments, limit-count supports shared counters via Redis, ensuring accurate enforcement across multiple gateway nodes. In benchmarks, Redis-backed distributed counting adds less than 1ms of latency per request at the 99th percentile.

limit-conn (Concurrent Connection Limiting)#

The limit-conn plugin restricts the number of concurrent requests being processed simultaneously. Unlike rate-based limits, connection limits protect against slow-client attacks and long-running requests that tie up backend connections.

This plugin is essential for APIs that serve large file downloads, streaming responses, or long-polling connections. It works by counting active connections per key and rejecting new connections when the limit is exceeded.

Combining Plugins#

APISIX allows stacking all three plugins on a single route. A typical production configuration might combine limit-count for daily quotas, limit-req for per-second smoothing, and limit-conn for concurrent connection caps. The plugins execute in order, and a request rejected by any plugin does not consume quota in subsequent plugins.

This layered approach mirrors industry best practice. Production APIs benefit from enforcing at least two independent rate limiting dimensions to provide comprehensive protection.

FAQ#

What HTTP status code should I return for rate-limited requests?#

Return HTTP 429 (Too Many Requests) as defined in RFC 6585. Include a Retry-After header with the number of seconds the client should wait before retrying. Additionally, include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers so clients can proactively manage their request rate. APISIX's limit-count plugin returns these headers automatically.

How do I handle rate limiting in a distributed gateway deployment?#

Use a shared counter store such as Redis. APISIX's limit-count plugin natively supports Redis and Redis Cluster backends for distributed counter synchronization. This ensures that rate limits are enforced accurately regardless of which gateway node processes the request. The trade-off is a small latency increase (typically under 1ms) for the Redis round-trip on each request.

Should I rate limit internal service-to-service traffic?#

Yes, but with different thresholds. Internal rate limiting prevents cascading failures when one service sends an unexpectedly high volume of requests to another. Set internal limits based on measured capacity rather than commercial quotas. Circuit breakers complement internal rate limiting by stopping requests entirely when a downstream service is unhealthy.

How do I communicate rate limits to API consumers?#

Document rate limits in your API reference and include them in onboarding materials. Use standard rate limit response headers on every response (not just 429 responses) so clients can monitor their consumption in real time. Provide a dedicated endpoint or dashboard where consumers can check their current usage against their quota. For paid tiers, send proactive notifications when consumers approach their limits.

· 8 min read

API gateway security is the practice of protecting your API infrastructure at the edge by enforcing authentication, authorization, rate limiting, and traffic filtering before requests reach backend services. A properly secured gateway reduces attack surface, prevents data breaches, and ensures compliance across every API endpoint in your organization.

Why API Gateway Security Matters#

APIs have become the primary attack vector for modern applications. According to the OWASP API Security Top 10 (2023 edition), broken object-level authorization and broken authentication remain the two most critical API vulnerabilities, affecting organizations across every industry. The explosive growth of API-first architectures has created an equally explosive growth in API-targeted attacks.

The cost of getting API security wrong is substantial, as breaches involving API vulnerabilities tend to take longer to identify and contain and carry significant financial impact. The API gateway sits at a unique vantage point: it processes every inbound request, making it the single most effective location to enforce security policies consistently.

Common API Threats#

Understanding the threat landscape is essential for building an effective defense. The following categories represent the most frequent and damaging attack patterns targeting APIs today.

Broken Object-Level Authorization (BOLA)#

BOLA attacks exploit weak authorization checks to access resources belonging to other users. An attacker modifies object identifiers in API requests (for example, changing /users/123/orders to /users/456/orders) to retrieve unauthorized data. BOLA remains one of the most exploited API vulnerability classes, particularly in organizations where API management and authorization enforcement have not kept pace with API proliferation.

Injection Attacks#

SQL injection, NoSQL injection, and command injection remain persistent threats. Attackers embed malicious payloads in query parameters, headers, or request bodies. Despite being a well-known vulnerability class, injection attacks continue to appear frequently in web application security assessments.

Broken Authentication#

Weak or improperly implemented authentication mechanisms allow attackers to assume legitimate user identities. Common failures include missing token validation, weak password policies, credential stuffing vulnerabilities, and improper session management. Credential stuffing attacks account for billions of login attempts monthly across the internet.

Excessive Data Exposure#

APIs frequently return more data than the client needs, relying on the frontend to filter sensitive fields. Attackers bypass the frontend and consume raw API responses directly, gaining access to data never intended for display. This over-exposure is especially dangerous in mobile applications where API traffic is easily intercepted.

Rate Limit Bypass#

Without proper rate limiting, attackers can launch brute-force attacks, denial-of-service campaigns, and credential enumeration at scale. Automated bot traffic constitutes a significant portion of all internet traffic, and much of it targets API endpoints specifically.

Security Layers at the Gateway#

A defense-in-depth approach applies multiple security controls at the gateway layer, each addressing a distinct category of risk.

Authentication#

The gateway should verify identity before any request reaches a backend service. Common mechanisms include JWT validation, OAuth 2.0 token introspection, API key verification, and mutual TLS (mTLS) for service-to-service communication. Centralizing authentication at the gateway eliminates the risk of inconsistent enforcement across individual services.

Authorization#

Beyond verifying identity, the gateway must enforce access control. Role-based access control (RBAC), attribute-based access control (ABAC), and scope-based authorization ensure that authenticated users can only access resources and operations they are permitted to use. Fine-grained authorization at the gateway prevents BOLA vulnerabilities at scale.

Rate Limiting and Throttling#

Rate limiting protects backend services from abuse and ensures fair resource allocation. Effective rate limiting operates at multiple granularities: per consumer, per route, per IP address, and globally. A substantial share of traffic on the average website comes from bots, and rate limiting is the first line of defense against automated abuse.

IP Restriction#

IP allowlists and denylists provide coarse-grained access control. While not sufficient as a sole security measure, IP restriction is valuable for restricting administrative endpoints, limiting partner API access to known address ranges, and blocking traffic from regions associated with attack activity.

WAF and CORS#

A Web Application Firewall (WAF) at the gateway layer inspects request payloads for known attack patterns. CORS policies prevent unauthorized cross-origin requests from browser-based clients. Together, they address both server-side injection attacks and client-side cross-origin abuse.

TLS Termination#

TLS termination at the gateway ensures that all client-to-gateway traffic is encrypted. The gateway handles certificate management, cipher suite configuration, and protocol version enforcement, relieving backend services of this operational burden. The vast majority of web traffic now uses HTTPS, and TLS is considered a baseline requirement for any production API.

Request Validation#

Schema-based request validation rejects malformed or oversized payloads before they reach backend services. Validating request structure, data types, and content length at the gateway prevents injection attacks and reduces the attack surface of downstream services.

Zero-Trust API Architecture#

Zero-trust architecture assumes that no request is inherently trustworthy, regardless of its origin. Every API call must be authenticated, authorized, and validated, whether it arrives from the public internet, an internal service, or a trusted partner.

At the gateway layer, zero-trust principles translate into several concrete practices. Every request carries verifiable identity credentials. Authorization is evaluated per request rather than per session. Network location (internal vs. external) does not confer implicit trust. All traffic is encrypted, including east-west service-to-service communication. The API gateway enables zero-trust by serving as a policy enforcement point. It validates tokens, checks permissions, and applies security policies uniformly across all traffic, creating a consistent security boundary regardless of the underlying network topology.

Security Best Practices#

The following practices represent a comprehensive approach to API gateway security that organizations should adopt incrementally based on risk profile.

  1. Enforce authentication on every endpoint. No API route should be accessible without verified identity. Use JWTs with short expiration times and validate signatures on every request.

  2. Implement least-privilege authorization. Grant the minimum permissions required for each consumer. Default to deny and require explicit grants for sensitive operations.

  3. Apply rate limiting at multiple levels. Configure per-consumer, per-route, and global rate limits. Use sliding window algorithms to prevent burst abuse while accommodating legitimate traffic spikes.

  4. Validate all request inputs. Enforce request schema validation at the gateway. Reject payloads that exceed expected sizes, contain unexpected fields, or fail type checks.

  5. Use mutual TLS for service-to-service calls. Encrypt and authenticate all internal traffic. Rotate certificates automatically and enforce certificate validation on every connection.

  6. Enable WAF rules for known attack patterns. Deploy rulesets targeting SQL injection, XSS, and command injection. Update rules regularly to address emerging attack vectors.

  7. Log and audit all security events. Capture authentication failures, authorization denials, rate limit triggers, and WAF blocks. Feed security logs into a SIEM for correlation and alerting.

  8. Rotate credentials and secrets regularly. Automate API key rotation, certificate renewal, and token signing key rotation. Never embed secrets in client-side code or version control.

  9. Restrict administrative API access. Protect management APIs with strong authentication, IP restrictions, and separate credentials from data-plane APIs.

  10. Conduct regular security assessments. Perform API-specific penetration testing, not just general web application assessments. The OWASP API Security Testing Guide provides a structured methodology.

How Apache APISIX Secures APIs#

Apache APISIX provides a comprehensive set of security plugins that implement each layer of the defense-in-depth model described above.

For IP-based access control, the ip-restriction plugin supports allowlists and denylists at the route level, enabling fine-grained control over which addresses can reach specific endpoints.

Cross-origin resource sharing is managed through the CORS plugin, which configures allowed origins, methods, and headers to prevent unauthorized cross-origin requests from browser clients.

CSRF protection is available through the CSRF plugin, which generates and validates CSRF tokens to prevent cross-site request forgery attacks on state-changing API operations.

For mutual TLS, APISIX supports mTLS configuration for both client-to-gateway and gateway-to-upstream connections, ensuring encrypted and mutually authenticated communication at every hop.

APISIX also supports JWT authentication, key authentication, OpenID Connect, rate limiting with multiple algorithms, and request body validation. The plugin architecture enables security policies to be composed per route, allowing teams to apply exactly the controls each endpoint requires without over- or under-securing traffic.

FAQ#

What is the difference between API gateway security and API security?#

API security is the broad discipline of protecting APIs across their entire lifecycle, including design, development, testing, and runtime. API gateway security specifically refers to the security controls enforced at the gateway layer during runtime, such as authentication, rate limiting, and input validation. The gateway is one component of a comprehensive API security strategy, not a replacement for secure coding practices and security testing.

Should I terminate TLS at the API gateway or at the backend service?#

Terminate TLS at the gateway for client-facing connections. This centralizes certificate management and offloads cryptographic processing from backend services. For traffic between the gateway and upstream services, use mTLS to maintain encryption and mutual authentication throughout the request path. This approach balances operational simplicity with end-to-end security.

How many rate limiting layers should an API gateway enforce?#

Apply at least three layers: a global rate limit to protect overall infrastructure capacity, a per-consumer limit to prevent any single client from monopolizing resources, and per-route limits for endpoints with expensive backend operations. Use sliding window or leaky bucket algorithms rather than fixed windows to provide smoother throttling behavior and prevent burst abuse at window boundaries.

· 8 min read

An API gateway and a load balancer serve different primary purposes. A load balancer distributes network traffic across multiple backend servers to maximize throughput and availability. An API gateway operates at the application layer to manage, secure, and transform API traffic with features like authentication, rate limiting, and request routing. In modern architectures, they complement each other and are frequently deployed together.

What is a Load Balancer#

A load balancer sits between clients and a pool of backend servers, distributing incoming requests to ensure no single server becomes overwhelmed. Load balancers operate at either Layer 4 (TCP/UDP) or Layer 7 (HTTP/HTTPS) of the OSI model.

Layer 4 load balancers route traffic based on IP address and port number without inspecting the request content. They are fast, protocol-agnostic, and add minimal latency. Layer 7 load balancers inspect HTTP headers, URLs, and sometimes request bodies to make more intelligent routing decisions.

Load balancers are foundational infrastructure. The vast majority of organizations use some form of load balancing in their production environments. The technology has been a networking staple for over two decades, with the core algorithms (round-robin, least connections, weighted distribution) remaining largely unchanged.

The primary value of a load balancer is availability. By distributing traffic and performing health checks, load balancers ensure that the failure of a single backend instance does not cause a service outage. They also enable horizontal scaling: adding more backend instances to handle increased traffic without changing the client-facing endpoint.

What is an API Gateway#

An API gateway is an application-layer proxy that acts as the single entry point for API consumers. Beyond routing requests to the correct backend service, an API gateway provides a rich set of cross-cutting concerns: authentication, authorization, rate limiting, request and response transformation, caching, logging, and monitoring.

API gateways emerged from the needs of microservices architectures and API-first product strategies. When an organization exposes dozens or hundreds of microservices, a gateway centralizes the operational concerns that would otherwise be duplicated across every service.

An API gateway typically operates exclusively at Layer 7 and understands application-level protocols like HTTP, gRPC, WebSocket, and GraphQL. It makes routing decisions based on URL paths, headers, query parameters, and even request body content.

Feature Comparison#

CapabilityLoad BalancerAPI Gateway
Traffic distributionYes (core function)Yes (built-in)
Health checksYesYes
SSL/TLS terminationYesYes
Layer 4 routingYesTypically no
Layer 7 routingL7 LB onlyYes (core function)
AuthenticationNoYes
AuthorizationNoYes
Rate limitingBasic (some L7 LBs)Yes (granular)
Request transformationNoYes
Response transformationNoYes
API versioningNoYes
Protocol translationLimitedYes (HTTP to gRPC, REST to GraphQL)
CachingLimitedYes
Developer portalNoYes (with management layer)
Analytics and monitoringBasic metricsDetailed API analytics
Circuit breakingSome implementationsYes
Canary/blue-green deploysSome implementationsYes

The table makes the distinction clear: load balancers focus on network-level traffic distribution, while API gateways focus on application-level API management. The overlap exists primarily in Layer 7 load balancers, which have gradually added some application-aware features.

Key Differences Explained#

Scope of Concern#

A load balancer answers the question: which backend server should handle this connection? An API gateway answers a broader set of questions: is this client authenticated? Are they authorized for this endpoint? Have they exceeded their rate limit? Does the request need transformation before forwarding? Should the response be cached?

In practice, most organizations using API gateways configure multiple cross-cutting policies (authentication, rate limiting, logging, and CORS), none of which fall within a traditional load balancer's responsibility.

Protocol Awareness#

Load balancers, especially at Layer 4, are largely protocol-agnostic. They route TCP connections without understanding the application protocol. API gateways are deeply protocol-aware. They parse HTTP methods, match URL patterns, inspect headers, and in many cases understand domain-specific protocols like gRPC and GraphQL.

This protocol awareness enables capabilities that load balancers cannot provide. For example, an API gateway can route GraphQL queries to different backend services based on the query's requested fields, or translate between REST and gRPC protocols transparently.

Configuration Granularity#

Load balancer configuration centers on server pools, health check parameters, and distribution algorithms. API gateway configuration is far more granular: per-route authentication requirements, per-consumer rate limits, request header injection, response body transformation, and conditional plugin execution.

A typical enterprise API gateway configuration manages 50-200 routes with distinct policy combinations, compared to a load balancer managing 10-30 server pools. The operational complexity reflects the difference in scope.

Performance Profile#

Layer 4 load balancers add microsecond-level latency because they operate below the HTTP layer. API gateways add millisecond-level latency because they must parse, inspect, and potentially transform HTTP requests. High-performance gateways like Apache APISIX, built on NGINX and LuaJIT, keep this overhead under 1ms for typical configurations. According to APISIX benchmark data, the gateway processes over 20,000 requests per second per core with authentication and rate limiting enabled.

When to Use Which#

Use a Load Balancer When#

  • You need to distribute TCP or UDP traffic across backend instances.
  • Your primary concern is availability and horizontal scaling.
  • You are load balancing non-HTTP protocols (databases, message queues, custom TCP services).
  • You want minimal latency overhead with no application-layer processing.

Use an API Gateway When#

  • You expose APIs to external consumers who need authentication and rate limiting.
  • You run a microservices architecture and need centralized cross-cutting concerns.
  • You need request or response transformation between clients and services.
  • You require detailed API analytics, logging, and monitoring.
  • You manage multiple API versions or need protocol translation.

Use Both Together#

In most production architectures, load balancers and API gateways coexist at different layers. A common deployment pattern places a Layer 4 or cloud-native load balancer (AWS NLB, Google Cloud Load Balancing) in front of a cluster of API gateway instances. The load balancer distributes traffic across gateway nodes for high availability, while the gateway handles application-level API management.

This separation of concerns allows each component to do what it does best.

How Apache APISIX Combines Both#

Apache APISIX is an API gateway that includes built-in load balancing capabilities, effectively combining both roles into a single component for many use cases.

APISIX supports multiple load balancing algorithms natively, documented in its load balancing guide:

  • Round-robin (weighted): Distributes requests across upstream nodes based on configured weights.
  • Consistent hashing: Routes requests to the same backend based on a configurable key (IP, header, URI), useful for cache-friendly distributions.
  • Least connections: Sends requests to the upstream node with the fewest active connections.
  • EWMA (Exponential Weighted Moving Average): Selects the upstream node with the lowest response latency, adapting to real-time backend performance.

By combining API gateway features with production-grade load balancing, APISIX reduces architectural complexity for many deployments. Organizations that would otherwise deploy a separate load balancer and a separate API gateway can consolidate into a single APISIX layer, reducing operational overhead and network hops.

For large-scale deployments, a dedicated Layer 4 load balancer in front of APISIX nodes still makes sense for TCP-level high availability and DDoS protection. But within the application layer, APISIX handles both traffic distribution and API management without requiring an additional component.

FAQ#

Can an API gateway replace a load balancer entirely?#

For HTTP and gRPC traffic, a modern API gateway like Apache APISIX can replace a Layer 7 load balancer because it includes equivalent load balancing algorithms. However, for non-HTTP protocols (raw TCP, UDP, database connections) or for Layer 4 DDoS protection, a dedicated load balancer remains necessary. The most common production pattern uses both: a Layer 4 load balancer for network-level distribution and an API gateway for application-level management.

Does adding an API gateway increase latency compared to a load balancer alone?#

Yes, but the increase is typically small. A Layer 4 load balancer adds microseconds of latency. An API gateway adds 0.5-2ms depending on the number of active plugins. For most APIs where upstream service response times are 10-500ms, the gateway overhead is negligible. The operational benefits of centralized authentication, rate limiting, and observability far outweigh the minor latency cost.

Should I use a cloud provider's managed API gateway or deploy my own?#

Managed gateways (AWS API Gateway, Google Apigee) reduce operational burden but limit customization and can become expensive at high traffic volumes. AWS API Gateway charges per million requests, which can reach thousands of dollars monthly for high-traffic APIs. Self-managed gateways like Apache APISIX offer full control, unlimited throughput on your infrastructure, and no per-request fees, but require your team to operate the gateway cluster. Evaluate based on your traffic volume, customization needs, and operations capacity.

How does an API gateway differ from a reverse proxy?#

A reverse proxy forwards client requests to backend servers and is the foundation of both load balancers and API gateways. An API gateway is a specialized reverse proxy that adds API-specific features: authentication, rate limiting, request transformation, API versioning, and developer-facing analytics. NGINX, for example, can function as a reverse proxy, load balancer, or (with extensions) an API gateway. Apache APISIX is purpose-built as an API gateway with load balancing built in.

· 9 min read

Apache APISIX and Kong are the two most widely adopted open-source API gateways, both built on NGINX and Lua. APISIX differentiates itself with a fully dynamic architecture powered by etcd, higher single-core throughput, and a broader protocol support matrix, while Kong offers a mature enterprise ecosystem with extensive third-party integrations and a large plugin marketplace.

Overview#

Both projects serve as high-performance, extensible API gateways for microservices architectures. Kong was open-sourced in 2015 and has built a substantial commercial ecosystem around Kong Gateway Enterprise, Kong Konnect, and the Kong Plugin Hub. Apache APISIX entered the Apache Software Foundation incubator in 2019 and graduated as a top-level project in 2020, with rapid community growth.

Both projects are recognized as production-grade gateways and see active production deployments worldwide.

Architecture Comparison#

The architectural differences between APISIX and Kong are fundamental and affect day-to-day operations, scalability, and deployment complexity.

Apache APISIX Architecture#

APISIX uses NGINX as its data plane with Lua plugins running in the request lifecycle. Configuration is stored in etcd, a distributed key-value store that pushes changes to all gateway nodes in real time via watch mechanisms. This architecture means that route changes, plugin updates, and upstream modifications take effect within milliseconds without requiring restarts or reloads. There is no relational database dependency.

The etcd-based design gives APISIX a stateless data plane: any node can be added or removed without migration steps or database schema changes. This makes horizontal scaling straightforward and reduces operational overhead significantly in Kubernetes environments where pods are ephemeral.

Kong Architecture#

Kong also uses NGINX and Lua for its data plane. Configuration is stored in PostgreSQL or Cassandra (though Cassandra support has been deprecated in newer versions). Kong's DB-mode requires database migrations when upgrading, and configuration changes propagate through a polling mechanism with a configurable cache TTL, which introduces a delay between API calls to the Admin API and actual enforcement at the proxy layer.

Kong also offers a DB-less mode where configuration is loaded from a declarative YAML file, which eliminates the database dependency but sacrifices the ability to modify configuration dynamically through the Admin API at runtime. Kong's commercial offering, Konnect, provides a managed control plane that addresses many of these operational concerns.

Performance Benchmarks#

Performance characteristics matter at scale, where even small per-request overhead compounds into significant infrastructure costs.

Key architectural differences that affect performance:

  • Route matching: APISIX uses a radix tree-based routing algorithm. Kong uses a different matching approach. The routing algorithm affects lookup time as the number of routes grows.
  • Configuration propagation: APISIX pushes configuration changes from etcd to all nodes in real time. Kong's DB-mode polls the database on a configurable interval, introducing a delay between configuration changes and enforcement.
  • Memory model: Both use NGINX's event-driven architecture, but their plugin execution models differ in per-request allocation patterns.

We recommend benchmarking both gateways with your actual workload, plugin chain, and hardware to get meaningful performance comparisons. Vendor-published benchmarks often test under ideal conditions that may not reflect your production environment.

For many production deployments, both gateways provide sufficient throughput, and the choice often depends on factors beyond raw performance such as ecosystem maturity, plugin availability, and operational familiarity.

Feature Comparison#

FeatureApache APISIXKong (OSS)
Plugin count (built-in)80+40+ (OSS), 200+ (Enterprise)
Protocol supportHTTP/1.1, HTTP/2, HTTP/3, gRPC, WebSocket, TCP/UDP, MQTT, DubboHTTP/1.1, HTTP/2, gRPC, WebSocket, TCP/UDP
DashboardApache APISIX Dashboard (OSS)Kong Manager (Enterprise only)
Admin APIFull REST API, fully dynamicREST API, DB-mode or DB-less
Service discoveryNacos, Consul, Eureka, DNS, KubernetesDNS, Consul (others via plugins)
Kubernetes ingressAPISIX Ingress Controller (CRD-based)Kong Ingress Controller (KIC)
AI gateway capabilitiesai-proxy plugin, multi-LLM routingAI Gateway plugins (Enterprise)
Multi-language plugin supportGo, Java, Python, Wasm, LuaGo, JavaScript, Python (PDK)
Configuration storageetcd (distributed, real-time)PostgreSQL (requires migrations)
Canary/traffic splittingBuilt-in traffic-split pluginCanary plugin (Enterprise)

Both gateways support core functionality like rate limiting, authentication (JWT, OAuth 2.0, API key, LDAP), load balancing, health checks, and circuit breaking. The primary differences lie in the breadth of built-in features available in the open-source edition versus features gated behind enterprise licensing.

Plugin Ecosystem#

APISIX ships with over 80 built-in plugins covering authentication, security, traffic management, observability, and protocol transformation. Notably, plugins for serverless functions (running custom Lua, Java, or Go code inline), AI proxy routing, and advanced traffic management are available in the open-source edition.

Kong's open-source edition includes approximately 40 built-in plugins, with a substantial number of additional plugins available through Kong Plugin Hub and the enterprise edition. Kong's plugin marketplace includes many third-party and partner-contributed plugins, giving it a broader ecosystem for specific vendor integrations like Datadog, PagerDuty, and Moesif.

For custom plugin development, APISIX supports external plugins via gRPC-based plugin runners in Go, Java, and Python, as well as Wasm-based plugins that run in a sandboxed environment. Kong offers a Plugin Development Kit (PDK) supporting Go, JavaScript, and Python alongside native Lua plugins. Both projects accept community-contributed plugins, and their ecosystems continue to grow.

Kubernetes Integration#

Both gateways offer mature Kubernetes ingress controllers, though they differ in design philosophy.

The APISIX Ingress Controller supports both custom resource definitions (CRDs) specific to APISIX and standard Kubernetes Ingress resources. It communicates with the APISIX data plane through the Admin API and supports Gateway API, the emerging Kubernetes standard for traffic management. Configuration changes propagate instantly through etcd.

The Kong Ingress Controller (KIC) also supports CRDs and standard Kubernetes Ingress resources, with Kong-specific annotations for extended functionality. KIC translates Kubernetes resources into Kong configuration, applying them through the Admin API. KIC has a longer track record in production Kubernetes environments and benefits from extensive documentation and community resources.

Both controllers are actively maintained and see regular releases aligned with Kubernetes version updates.

Community and Ecosystem#

MetricApache APISIXKong
LicenseApache 2.0Apache 2.0 (OSS)
GovernanceApache Software FoundationKong Inc.
First release20192015

APISIX benefits from Apache Software Foundation governance, which ensures vendor-neutral development and community-driven roadmap decisions. Kong benefits from the backing of Kong Inc., which provides dedicated engineering resources, enterprise support, and a commercial ecosystem that many large organizations value.

Both projects maintain active community forums, Slack channels, and regular release cadences. Kong's longer market presence gives it an advantage in terms of available tutorials, third-party integrations, and consultant familiarity.

When to Choose Apache APISIX#

APISIX is the stronger choice when your requirements include:

  • Dynamic configuration at scale: Environments where routes and plugins change frequently benefit from etcd-based instant propagation without restarts.
  • Maximum open-source functionality: Teams that need advanced features like traffic splitting, AI proxy, and multi-protocol support without enterprise licensing.
  • High-performance requirements: Workloads where per-request latency and single-core throughput directly impact infrastructure costs.
  • Kubernetes-native deployments: Organizations adopting Gateway API and wanting tight integration with cloud-native service discovery (Nacos, Consul, Eureka).
  • Vendor-neutral governance: Teams that prefer Apache Software Foundation stewardship over single-vendor control.

When to Choose Kong#

Kong is the stronger choice when your requirements include:

  • Mature enterprise ecosystem: Organizations that need commercial support, SLA guarantees, and a proven enterprise deployment track record.
  • Extensive third-party integrations: Environments with specific vendor integration needs covered by Kong's plugin marketplace.
  • Existing Kong investment: Teams already running Kong in production where migration cost outweighs technical advantages.
  • Managed control plane: Organizations that prefer a SaaS-managed control plane (Kong Konnect) to reduce operational burden.
  • Broad hiring market: Teams that can more easily find engineers with Kong experience due to its longer market presence.

FAQ#

Can APISIX and Kong run side by side during a migration?#

Yes. Both gateways can operate in parallel by splitting traffic at the load balancer level. A common migration strategy routes new services through APISIX while existing services continue running through Kong. Gradual traffic shifting with health checks ensures zero-downtime migration. The timeline depends on the number of routes, custom plugins, and testing requirements.

Is APISIX harder to operate because it requires etcd?#

etcd adds a dependency compared to Kong's DB-less mode, but in practice, etcd is a well-understood, battle-tested component already present in most Kubernetes clusters (it is the backing store for Kubernetes itself). Operating etcd requires standard distributed systems practices: run an odd number of nodes (3 or 5), monitor disk latency, and maintain regular snapshots. For teams already running Kubernetes, etcd operational knowledge is typically already available. The operational cost of etcd is generally lower than managing PostgreSQL migrations required by Kong's DB-mode.

Which gateway has better AI and LLM support?#

Both gateways are investing in AI gateway capabilities, but they approach it differently. APISIX provides the ai-proxy plugin in its open-source edition, supporting multi-model routing, token-based rate limiting, and prompt transformation for major LLM providers. Kong offers AI Gateway plugins primarily through its enterprise edition and Konnect platform. For teams building AI-powered applications on an open-source budget, APISIX currently provides more built-in AI functionality without licensing costs.

How do the two gateways compare on gRPC and streaming support?#

APISIX provides native gRPC proxying, gRPC-Web transcoding, and HTTP-to-gRPC transformation out of the box, along with support for HTTP/3 (QUIC), Dubbo, and MQTT protocols. Kong supports gRPC proxying and gRPC-Web through plugins, with HTTP/2 support on both client and upstream connections. For teams heavily invested in gRPC or multi-protocol architectures, APISIX's broader built-in protocol support reduces the need for custom plugins or sidecars.

· 7 min read

An open-source API gateway sits between clients and backend services, handling routing, authentication, rate limiting, and observability. Apache APISIX, Kong, Envoy, and Traefik are among the most widely adopted options, each with distinct architectural decisions that affect performance, extensibility, and operational complexity.

Why the Choice of API Gateway Matters#

Organizations running microservices at scale route millions of requests per day through their gateway layer. The gateway you choose determines your latency floor, plugin flexibility, and how much operational overhead your platform team absorbs.

Choosing poorly means rearchitecting under pressure. Choosing well means a gateway that scales with your traffic for years without becoming a bottleneck.

Feature Comparison Table#

FeatureApache APISIXKongEnvoyTraefik
LanguageLua (NGINX + LuaJIT)Lua (NGINX + LuaJIT)C++Go
Configuration StoreetcdPostgreSQL / CassandraxDS API (control plane)File / KV stores
Admin APIRESTful, fully dynamicRESTfulxDS gRPCREST + dashboard
Hot ReloadYes, sub-millisecondPartial (DB polling)Yes (xDS push)Yes (provider watch)
Plugin Count100+ built-in60+ bundled (more in Hub)~30 HTTP filters~30 middlewares
Plugin LanguagesLua, Java, Go, Python, WasmLua, Go (PDK)C++, WasmGo (middleware)
gRPC ProxyingNativeSupportedNativeSupported
HTTP/3 (QUIC)SupportedExperimentalSupportedSupported
DashboardBuilt-in (APISIX Dashboard)Kong Manager (Enterprise)None (third-party)Built-in
LicenseApache 2.0Apache 2.0 (OSS) / Proprietary (Enterprise)Apache 2.0MIT

Note: Feature details are based on each project's official documentation as of early 2026. Check the respective project sites for the latest status.

Detailed Breakdown#

Apache APISIX#

Apache APISIX is built on NGINX and LuaJIT, using etcd as its configuration store. This architecture eliminates database dependencies on the data path: route changes propagate to every gateway node within milliseconds without restarts or reloads.

The plugin ecosystem includes over 100 built-in options spanning authentication (JWT, key-auth, OpenID Connect), traffic management (rate limiting, circuit breaking), observability (Prometheus, Zipkin, OpenTelemetry), and transformation (request/response rewriting, gRPC transcoding). Developers can write custom plugins in Lua, Go, Java, Python, or WebAssembly, making it one of the most polyglot gateway runtimes available.

APISIX supports the Kubernetes Ingress Controller pattern natively. The APISIX Ingress Controller watches Kubernetes resources and translates them into APISIX routing configuration, enabling declarative GitOps workflows while preserving the full plugin surface.

As an Apache Software Foundation top-level project, APISIX is community-governed and vendor-neutral.

Kong#

Kong is the longest-established open-source API gateway, with a mature commercial ecosystem. It shares the NGINX + LuaJIT foundation with APISIX but relies on PostgreSQL or Cassandra as its configuration store. This architectural choice introduces a database dependency for configuration storage, which adds operational complexity for HA deployments.

Kong's plugin hub offers approximately 60 bundled plugins in the open-source edition, with additional enterprise-only plugins for advanced features like OAuth2 introspection and advanced rate limiting. The Go Plugin Development Kit (PDK) allows extending Kong in Go, though Lua remains the primary plugin language.

Kong has a strong enterprise support ecosystem with commercial offerings (Kong Gateway Enterprise, Kong Konnect) and a large user community.

Envoy#

Envoy is a high-performance C++ proxy originally built at Lyft, now a CNCF graduated project. It excels as a service mesh data plane and is the foundation for Istio, AWS App Mesh, and other mesh implementations.

Envoy's configuration model uses the xDS (discovery service) API, a gRPC-based protocol that pushes configuration updates from a control plane. This design is powerful but means Envoy does not function as a standalone gateway without a control plane component. Organizations adopting Envoy as an edge gateway typically pair it with a control plane like Gloo Edge or similar tools.

The filter chain model supports around 30 built-in HTTP filters. Custom extensions require C++ or WebAssembly, raising the barrier for teams without C++ expertise. Envoy is most commonly deployed as a sidecar proxy within a service mesh, though it is also used as an edge proxy.

Traefik#

Traefik is written in Go and designed for automatic service discovery. It integrates natively with Docker, Kubernetes, Consul, and other orchestrators, automatically detecting new services and generating routes without manual configuration. This auto-discovery model makes Traefik popular for development environments and smaller-scale production deployments.

Traefik includes built-in Let's Encrypt integration for automatic TLS certificate provisioning, a feature that requires additional tooling in other gateways. Its middleware system offers approximately 30 built-in options covering authentication, rate limiting, headers manipulation, and circuit breaking.

Traefik has a large community and is widely used in Docker-native environments.

Performance Considerations#

Performance varies significantly based on configuration, plugin chains, TLS termination, and upstream complexity. When evaluating gateways, run your own benchmarks with your actual workload patterns rather than relying on vendor-published numbers.

Key factors that affect gateway performance:

  • Architecture: C++ and LuaJIT-based gateways (Envoy, APISIX, Kong) generally achieve lower latency than pure Go implementations
  • Configuration store: Gateways that avoid database queries on the data path (APISIX, Envoy) tend to have more consistent latency
  • Plugin overhead: Each active plugin adds processing time. Test with your actual plugin chain enabled
  • Connection handling: The NGINX event-driven model (APISIX, Kong) handles high concurrency efficiently

We recommend benchmarking the specific gateways you are considering with a representative workload on hardware similar to your production environment.

When to Choose Which#

Choose Apache APISIX when you need a large built-in plugin ecosystem, fully dynamic configuration without restarts, multi-language plugin support, and no database dependency. It suits teams building platform-grade API infrastructure. See the getting started guide to evaluate it hands-on.

Choose Kong when you are operating in an enterprise environment with existing Kong deployments, need commercial support, or require specific enterprise-only plugins. Kong's maturity means more third-party integrations and consultants are available.

Choose Envoy when your primary use case is a service mesh data plane, you need advanced load balancing algorithms, or you are already running Istio or a similar mesh. Envoy is less suited as a standalone edge gateway due to its control plane dependency.

Choose Traefik when auto-discovery and zero-configuration routing are priorities, or you need built-in Let's Encrypt integration without additional tooling. Traefik excels in Docker-native and small-to-medium Kubernetes environments.

Migration Considerations#

Migrating between gateways is nontrivial and typically requires careful planning. Key factors include:

  • Plugin compatibility: Not all plugins have equivalents across gateways. Audit your active plugins and identify gaps before migrating.
  • Configuration translation: Each gateway uses a different configuration format. Automated translation tools can help but manual verification is essential.
  • Operational tooling: Monitoring dashboards, CI/CD pipelines, and alerting rules need updating.
  • Canary approach: Running both gateways in parallel behind a load balancer and shifting traffic gradually is the safest migration strategy.

Frequently Asked Questions#

Is Apache APISIX production-ready for enterprise workloads?#

Yes. Apache APISIX is an Apache Software Foundation top-level project used in production by organizations worldwide. The etcd-backed architecture provides high availability without single points of failure when deployed with an etcd cluster.

Can I migrate from Kong to APISIX without downtime?#

A zero-downtime migration is achievable using a canary deployment approach: run both gateways in parallel behind a load balancer, gradually shifting traffic from Kong to APISIX as you validate route-by-route equivalence. APISIX supports most Kong plugin equivalents natively, and the Admin API allows automated route provisioning during migration.

How do open-source API gateways compare to cloud-managed options like AWS API Gateway?#

Cloud-managed gateways trade control for convenience. They handle infrastructure operations but impose vendor lock-in, per-request pricing that grows with traffic volume, and limited plugin customization. Open-source gateways like APISIX provide full control over the data plane, support multi-cloud and hybrid deployments, and eliminate per-request platform fees.

Which gateway has the best Kubernetes support?#

All four gateways support Kubernetes, but the depth varies. APISIX and Kong offer dedicated ingress controllers with CRD-based configuration. Envoy integrates through the Kubernetes Gateway API and service mesh deployments. Traefik auto-discovers Kubernetes services natively. The emerging Kubernetes Gateway API standard is supported by all four projects to varying degrees, and is becoming the recommended approach for new deployments.

· 13 min read

An API gateway is a server that sits between clients and backend services, acting as the single entry point for all API traffic. It accepts incoming requests, applies policies such as authentication, rate limiting, and transformation, then routes each request to the appropriate upstream service and returns the response to the caller.

In practice, an API gateway consolidates cross-cutting concerns that would otherwise be duplicated across every microservice: access control, traffic shaping, observability, and protocol translation. Instead of embedding this logic in each service, teams centralize it at the gateway layer, reducing code duplication, simplifying deployments, and giving platform teams a single control plane for governing API behavior at scale.

How Does an API Gateway Work?#

The request lifecycle through an API gateway follows a well-defined pipeline:

  1. Client sends a request. A mobile app, browser, or upstream service issues an HTTP/HTTPS request to the gateway's public endpoint. The client never communicates directly with individual backend services.

  2. Gateway evaluates policies. The gateway inspects the incoming request and runs it through a chain of plugins or middleware. This typically includes validating authentication tokens (JWT, OAuth 2.0, API keys), enforcing rate limits, checking IP allowlists, and applying request transformations such as header injection or body rewriting.

  3. Gateway routes to the upstream. Based on the request path, host header, or other matching criteria, the gateway selects a target upstream service. If multiple instances are registered, the gateway applies a load-balancing algorithm (round-robin, least connections, consistent hashing) to pick a specific node.

  4. Backend processes the request. The upstream service handles the business logic and returns a response to the gateway.

  5. Gateway processes the response. Before forwarding the response to the client, the gateway can apply response transformations, inject CORS headers, compress the payload, or cache the result for subsequent identical requests.

  6. Gateway returns the response. The final response reaches the client with appropriate status codes, headers, and payload. Throughout this entire cycle, the gateway emits metrics, access logs, and traces that feed into observability systems.

This pipeline executes in milliseconds. High-performance gateways like Apache APISIX complete it in under 1ms of added latency, making the overhead negligible even for latency-sensitive workloads.

Key Features of an API Gateway#

A production-grade API gateway provides a broad surface of capabilities. The following features represent the core functionality that distinguishes a gateway from a simple reverse proxy.

Request Routing#

The gateway matches incoming requests to upstream services based on URI paths, HTTP methods, headers, query parameters, or custom expressions. Advanced gateways support regex-based matching, wildcard routes, and priority-weighted rules. Apache APISIX supports radixtree-based routing that scales efficiently even with thousands of route entries.

Load Balancing#

Distributing traffic across service instances prevents hotspots and improves availability. Gateways typically support round-robin, weighted round-robin, least connections, consistent hashing, and EWMA (exponentially weighted moving average) algorithms. Health checks --- both active probes and passive failure detection --- automatically remove unhealthy nodes from the upstream pool.

Authentication and Authorization#

Centralizing identity verification at the gateway eliminates the need for each service to implement its own auth stack. Common mechanisms include JWT validation, OAuth 2.0 token introspection, HMAC signatures, LDAP, and API key authentication. Some gateways also integrate with external identity providers through OpenID Connect.

Rate Limiting#

Rate limiting protects backend services from traffic spikes, abusive clients, and cascading failures. Gateways enforce limits at multiple granularities: per consumer, per route, per IP, or globally. Apache APISIX provides configurable rate limiting plugins that support both fixed-window and leaky-bucket algorithms, with shared counters across gateway nodes via Redis.

Caching#

Response caching at the gateway layer reduces backend load and improves latency for read-heavy endpoints. Gateways cache responses based on configurable TTLs, cache keys (URI, headers, query strings), and invalidation rules. For APIs serving relatively static data --- product catalogs, configuration endpoints, reference data --- caching can reduce upstream requests by 80% or more.

Request and Response Transformation#

Gateways can rewrite requests before they reach the backend and transform responses before they reach the client. This includes header manipulation, body rewriting, protocol translation (HTTP to gRPC, REST to GraphQL), and payload format conversion. Transformation eliminates the need for adapter services and simplifies API versioning.

Monitoring and Observability#

A gateway sees every request, making it the natural instrumentation point for API metrics. Production gateways export access logs, request/response latencies (P50, P95, P99), error rates, and throughput to systems like Prometheus, Datadog, and OpenTelemetry collectors. Apache APISIX ships with built-in integrations for Prometheus, Grafana, SkyWalking, and Zipkin.

SSL/TLS Termination#

The gateway handles TLS handshakes, certificate management, and encryption offloading so that backend services can communicate over plain HTTP internally. This simplifies certificate rotation, centralizes security policy, and reduces CPU overhead on upstream services. Modern gateways also support mTLS for service-to-service authentication.

Circuit Breaking#

When a backend service becomes degraded or unresponsive, a circuit breaker at the gateway stops forwarding requests to it, preventing cascading failures across the system. After a configurable cooldown, the gateway sends probe requests to test recovery. This pattern is critical in microservices architectures where a single failing service can take down an entire request chain.

API Versioning and Canary Releases#

Gateways can route a percentage of traffic to new service versions, enabling canary deployments and blue-green releases without infrastructure changes. Traffic-splitting rules let teams gradually shift load from v1 to v2, monitor error rates, and roll back instantly if metrics degrade.

API Gateway vs Load Balancer vs Reverse Proxy#

These three components overlap in functionality but serve different primary purposes:

CapabilityReverse ProxyLoad BalancerAPI Gateway
Request forwardingYesYesYes
SSL terminationYesSometimesYes
Load balancingBasicAdvancedAdvanced
Health checksLimitedYesYes
AuthenticationNoNoYes
Rate limitingNoNoYes
Request transformationNoNoYes
API-aware routingNoNoYes
Response cachingYesNoYes
Observability/metricsBasicBasicComprehensive
Protocol translationNoNoYes
Plugin/middleware ecosystemLimitedNoExtensive

A reverse proxy (e.g., Nginx, HAProxy in proxy mode) forwards client requests to backend servers, provides SSL termination, and can cache static content. It operates at the HTTP level but lacks API-specific intelligence.

A load balancer (e.g., AWS ALB, HAProxy, Envoy in LB mode) distributes traffic across server instances using health checks and balancing algorithms. Layer 4 load balancers work at the TCP level; Layer 7 load balancers can inspect HTTP headers but still lack API-layer logic like authentication or transformation.

An API gateway builds on reverse proxy and load balancing capabilities but adds an API-aware policy layer: authentication, rate limiting, request/response transformation, observability, and developer portal integration. It is purpose-built for managing API traffic.

In practice, many organizations start with a reverse proxy or load balancer and later adopt an API gateway as their API surface grows. Some gateways, including Apache APISIX, are built on top of proven proxies (APISIX uses Nginx and OpenResty) and inherit their performance characteristics while adding the API management layer.

API Gateway Use Cases#

Microservices Architecture#

In a microservices system with dozens or hundreds of services, an API gateway provides the single entry point that abstracts internal service topology from external consumers. Clients interact with one stable endpoint; the gateway handles service discovery, routing, and cross-cutting concerns. Without a gateway, each client must know the location and protocol of every service, creating tight coupling and operational fragility.

Mobile and IoT Backends#

Mobile clients operate under bandwidth, latency, and battery constraints that differ significantly from desktop browsers. An API gateway can aggregate multiple backend calls into a single response (the Backend-for-Frontend pattern), compress payloads, and adapt protocols. For IoT devices that may use MQTT or CoAP, the gateway translates between device protocols and internal HTTP/gRPC services.

Multi-Cloud and Hybrid Deployments#

Organizations running services across AWS, GCP, Azure, and on-premises data centers use an API gateway as the unified traffic layer. The gateway abstracts the underlying infrastructure, enabling consistent routing, security policies, and observability regardless of where a service is deployed. This is especially valuable during cloud migration, where services move between environments incrementally.

API Monetization#

Companies that expose APIs as products --- payment processors, data providers, communication platforms --- use gateways to enforce usage tiers, track consumption per API key, and generate billing data. Rate limiting by tier, quota enforcement, and detailed usage analytics are all gateway responsibilities in this model.

Zero-Trust Security#

A gateway enforces authentication and authorization at the network edge, ensuring that no unauthenticated request reaches backend services. Combined with mTLS for internal traffic, IP allowlisting, and WAF integration, the gateway becomes a core component of a zero-trust architecture. It can also mask or redact sensitive fields in responses to prevent data leakage.

Legacy System Modernization#

When migrating from monolithic to microservices architectures, an API gateway acts as the facade in the strangler fig pattern. New services are deployed behind the gateway alongside the legacy monolith. The gateway gradually shifts traffic from old endpoints to new ones, allowing incremental migration without disrupting existing clients.

Benefits of Using an API Gateway#

Simplified Client Integration#

Clients interact with a single, well-documented endpoint instead of tracking the addresses and protocols of individual services. This reduces client-side complexity, eliminates service discovery logic in front-end code, and makes API consumption predictable.

Centralized Security#

Authentication, authorization, encryption, and threat detection are enforced at one layer rather than reimplemented in every service. A single policy change at the gateway propagates instantly across all APIs. This consistency eliminates the security gaps that emerge when individual teams implement auth differently.

Operational Visibility#

Because every request passes through the gateway, teams gain comprehensive metrics, access logs, and distributed traces without instrumenting each service individually. Dashboards built on gateway telemetry provide real-time visibility into traffic patterns, error rates, and latency distributions across the entire API surface.

Reduced Backend Load#

Caching, request deduplication, and rate limiting at the gateway layer prevent unnecessary calls from reaching backend services. This directly reduces infrastructure costs and improves system stability during traffic spikes. For read-heavy APIs, gateway caching alone can cut upstream load by an order of magnitude.

Faster Time to Market#

Developers focus on business logic rather than reimplementing cross-cutting concerns. Adding authentication to a new service takes a single plugin configuration at the gateway instead of weeks of development. Teams ship faster because infrastructure concerns are already solved.

Independent Scalability#

The gateway and backend services scale independently. During a traffic surge, teams can horizontally scale the gateway layer without modifying any backend service. Conversely, backend services can be scaled, redeployed, or replaced without any client-facing changes.

How Apache APISIX Works as an API Gateway#

Apache APISIX is a high-performance, cloud-native API gateway built on Nginx and LuaJIT. It is designed for environments where throughput, latency, and extensibility are critical requirements.

Performance at scale. APISIX handles over 18,000 requests per second per CPU core with a median latency of 0.2ms. This performance comes from its non-blocking, event-driven architecture and the efficiency of LuaJIT-compiled plugin execution. For comparison, this throughput exceeds most Java- and Go-based gateways by a significant margin.

Extensive plugin ecosystem. APISIX ships with over 100 built-in plugins covering authentication (JWT, OAuth, LDAP, OpenID Connect), traffic control (rate limiting, circuit breaking, traffic mirroring), observability (Prometheus, SkyWalking, OpenTelemetry), and transformation (gRPC transcoding, request rewriting, response rewriting). Plugins can also be written in Lua, Java, Go, Python, or WebAssembly.

Dynamic configuration. Unlike traditional gateways that require restarts for configuration changes, APISIX reloads routes, upstreams, and plugin configurations in real time through its Admin API. This enables zero-downtime deployments and makes APISIX well-suited for CI/CD pipelines and GitOps workflows.

Proven adoption. APISIX powers over 147,000 deployments across more than 5,200 companies globally, spanning industries from fintech to telecommunications. Its Apache Software Foundation governance ensures vendor-neutral, community-driven development.

To get started with APISIX, see the getting started guide.

Frequently Asked Questions#

What is the difference between an API gateway and a load balancer?#

A load balancer distributes incoming traffic across multiple server instances using algorithms like round-robin or least connections. It operates at the network or transport layer (L4) or HTTP layer (L7) but does not understand API semantics. An API gateway performs load balancing as one of many functions, and adds API-specific capabilities: authentication, rate limiting, request transformation, caching, and observability. If you only need to distribute traffic, a load balancer suffices. If you need to manage, secure, and observe API traffic, you need a gateway.

Do I need an API gateway for a monolithic application?#

An API gateway is not strictly required for a monolith, but it can still add value. If your monolith exposes APIs consumed by external clients, mobile apps, or third-party integrators, a gateway provides centralized authentication, rate limiting, and monitoring without modifying the application. It also positions your architecture for incremental migration to microservices using the strangler fig pattern.

How does an API gateway affect latency?#

A well-implemented gateway adds minimal latency --- typically 0.2ms to 2ms per request depending on the number of active plugins. High-performance gateways like Apache APISIX are optimized for sub-millisecond overhead. The latency tradeoff is almost always worthwhile: the gateway eliminates redundant auth checks, reduces backend calls through caching, and prevents cascading failures through circuit breaking, all of which improve overall system response times.

Can an API gateway replace a service mesh?#

An API gateway and a service mesh serve different layers. The gateway handles north-south traffic (external clients to internal services), while a service mesh manages east-west traffic (service-to-service communication within the cluster). They are complementary, not competing, technologies. Some organizations use APISIX as both a gateway and an ingress controller, bridging the two layers, but a full service mesh (Istio, Linkerd) addresses concerns like mutual TLS between services and fine-grained internal traffic policies that fall outside a gateway's scope.

Is an API gateway the same as an API management platform?#

No. An API gateway is the runtime component that processes API traffic. An API management platform is a broader category that typically includes a gateway, a developer portal, API documentation tools, lifecycle management, and analytics dashboards. The gateway is the engine; the management platform is the full vehicle. Apache APISIX provides the high-performance gateway layer, and organizations often pair it with additional tooling for the complete API management lifecycle.

· 9 min read

gRPC is a high-performance, open-source remote procedure call (RPC) framework originally developed by Google. It uses Protocol Buffers for binary serialization and HTTP/2 for transport, enabling strongly typed service contracts, bidirectional streaming, and significantly smaller payload sizes compared to equivalent JSON over REST.

Why gRPC Exists#

REST has dominated API design for over fifteen years, and it remains an excellent choice for public-facing, resource-oriented APIs. However, as microservices architectures scaled into hundreds or thousands of inter-service calls per request, the limitations of REST became measurable: text-based JSON serialization consumes CPU cycles, HTTP/1.1 head-of-line blocking limits concurrency, and the lack of a formal contract language leads to integration drift.

Google developed gRPC internally (as Stubby) and open-sourced it in 2015. Adoption has grown steadily, and gRPC has become a common choice for latency-sensitive internal APIs in performance-critical systems.

How gRPC Works#

Protocol Buffers (Protobuf)#

Protocol Buffers are gRPC's interface definition language (IDL) and serialization format. A .proto file defines the service contract, including methods, request types, and response types:

syntax = "proto3";
service OrderService {  rpc GetOrder (OrderRequest) returns (OrderResponse);  rpc StreamOrders (OrderFilter) returns (stream OrderResponse);}
message OrderRequest {  string order_id = 1;}
message OrderResponse {  string order_id = 1;  string status = 2;  double total = 3;}

The protoc compiler generates client and server code in many languages from this single definition. Binary serialization produces payloads that are substantially smaller than equivalent JSON representations. This size reduction directly translates to lower network bandwidth consumption and faster serialization/deserialization.

HTTP/2 Transport#

gRPC runs exclusively on HTTP/2, which provides several performance advantages over HTTP/1.1:

  • Multiplexing. Multiple RPC calls share a single TCP connection without head-of-line blocking. A service making 50 concurrent calls to another service needs only one connection, not 50.
  • Header compression. HPACK compression significantly reduces header overhead for repeated headers.
  • Binary framing. HTTP/2 frames are binary, eliminating the text parsing overhead of HTTP/1.1.

These transport-level improvements compound with Protobuf serialization to deliver measurably lower latency in service-to-service communication.

Streaming Modes#

gRPC supports four communication patterns:

  1. Unary RPC. Single request, single response. Equivalent to a REST call.
  2. Server streaming. Client sends one request, server returns a stream of responses. Useful for real-time feeds or large result sets.
  3. Client streaming. Client sends a stream of messages, server returns one response. Useful for batched uploads or telemetry ingestion.
  4. Bidirectional streaming. Both sides send streams of messages concurrently. Useful for chat, collaborative editing, or real-time synchronization.

In practice, unary calls represent the majority of gRPC usage, with server streaming being the next most common pattern. The streaming capabilities differentiate gRPC from REST most sharply in real-time and high-throughput scenarios.

gRPC vs REST Comparison#

AspectgRPCREST
SerializationProtocol Buffers (binary)JSON (text)
TransportHTTP/2 onlyHTTP/1.1 or HTTP/2
Contract.proto file (strict)OpenAPI/Swagger (optional)
StreamingNative (4 modes)Limited (SSE, WebSocket)
Code GenerationBuilt-in (protoc)Third-party tools
Browser SupportRequires proxy (gRPC-Web)Native
Payload SizeSignificantly smallerBaseline
Latency (typical)Lower inter-serviceHigher inter-service
Human ReadabilityBinary (needs tooling)JSON is human-readable
CachingNot HTTP-cacheable by defaultHTTP caching built-in
Tooling MaturityGrowingExtensive

REST remains the dominant choice for public-facing APIs, while gRPC is increasingly preferred for internal microservices communication at larger organizations. The two protocols serve complementary roles rather than competing directly.

When to Use gRPC#

Use gRPC when:

  • Services communicate with high frequency and low latency requirements (trading systems, real-time analytics, game backends).
  • Payload efficiency matters because of bandwidth constraints or high message volumes.
  • Strong typing and contract-first development are priorities. The .proto file becomes the single source of truth.
  • Streaming is a core requirement (live data feeds, event-driven architectures, IoT telemetry).
  • Polyglot environments need consistent client/server code generation across multiple languages.

Stick with REST when:

  • The API is public-facing and must be browser-accessible without additional proxying.
  • Human readability and debuggability with standard HTTP tools (curl, Postman) are important for developer experience.
  • HTTP caching semantics are essential for performance.
  • The team's existing tooling and expertise are REST-centric, and migration cost outweighs the performance gain.

Many organizations adopting gRPC maintain REST for external APIs and use gRPC exclusively for internal communication, creating a dual-protocol architecture that leverages each protocol's strengths.

gRPC and API Gateways#

API gateways play a critical role in gRPC architectures by solving three problems: protocol translation, traffic management, and observability.

gRPC Proxying#

A gateway that natively supports HTTP/2 can proxy gRPC traffic directly, applying authentication, rate limiting, and logging without protocol translation. The gateway terminates the client's gRPC connection, applies policies, and forwards the call to the upstream gRPC service. This is the simplest integration model and preserves full gRPC semantics including streaming.

gRPC-Web Translation#

Browsers cannot make native gRPC calls because browser-based JavaScript does not expose the HTTP/2 framing layer required by gRPC. The gRPC-Web protocol bridges this gap: the browser sends gRPC-Web requests (HTTP/1.1 or HTTP/2 with modified framing), and the gateway translates them into native gRPC for the upstream service. This eliminates the need for a separate REST API layer for browser clients.

HTTP/JSON to gRPC Transcoding#

Many organizations need to expose gRPC services to clients that can only consume REST/JSON. An API gateway with transcoding capabilities automatically maps HTTP verbs and JSON payloads to gRPC methods and Protobuf messages based on annotations in the .proto file. This enables a single gRPC backend to serve both gRPC and REST clients without maintaining two codebases.

In practice, gRPC deployments behind an API gateway typically use a mix of pure gRPC proxying, gRPC-Web for browser access, and transcoding to serve REST clients.

How Apache APISIX Handles gRPC#

Apache APISIX provides native gRPC support across all three integration patterns described above.

Native gRPC Proxying#

APISIX proxies gRPC traffic natively over HTTP/2, supporting unary and streaming calls. Routes can be configured with gRPC-specific upstream settings, and the full plugin ecosystem applies to gRPC routes: authentication (JWT, key-auth), rate limiting, circuit breaking, and observability all work transparently on gRPC traffic.

gRPC-Web Support#

The grpc-web plugin enables browser clients to communicate with gRPC backends through APISIX. The plugin handles the protocol translation between gRPC-Web and native gRPC, allowing frontend teams to consume gRPC services directly without building a REST translation layer. This reduces the API surface area and eliminates a class of contract synchronization bugs.

HTTP/JSON to gRPC Transcoding#

The grpc-transcode plugin maps REST endpoints to gRPC methods using the Protobuf descriptor. After uploading the .proto file to APISIX, the plugin automatically exposes each gRPC method as an HTTP endpoint, translating JSON request bodies to Protobuf messages and Protobuf responses back to JSON. This is particularly valuable for organizations migrating from REST to gRPC incrementally, as existing REST clients continue working while backends are rewritten.

APISIX's gRPC performance is notable: internal benchmarks show gRPC proxying at approximately 15,000 RPS per CPU core with 0.3 milliseconds of added latency, comparable to its HTTP/1.1 proxying performance. The getting started guide includes gRPC configuration examples.

gRPC Best Practices#

  1. Version your .proto files carefully. Protobuf supports backward-compatible field additions, but removing or renaming fields breaks clients. Use reserved field numbers for deleted fields.

  2. Set deadlines on every RPC. Without a deadline, a hung upstream can hold client resources indefinitely. Missing or overly generous RPC deadlines are a common cause of cascading failures in distributed systems.

  3. Use load balancing at the connection level. Because HTTP/2 multiplexes many RPCs over one connection, TCP-level load balancing (L4) is insufficient. Use L7 load balancing or client-side balancing to distribute RPCs across backend instances.

  4. Implement health checking. gRPC defines a standard health checking protocol (grpc.health.v1.Health). Use it for readiness probes and load balancer health checks.

  5. Monitor per-method metrics. Track latency, error rate, and throughput per gRPC method, not just per service. A slow GetOrder method is invisible if aggregated with a fast ListOrders method.

FAQ#

Can gRPC completely replace REST?#

Not in most architectures. gRPC excels at internal service-to-service communication where performance, type safety, and streaming matter. REST remains superior for public APIs due to native browser support, human-readable payloads, HTTP caching, and broader tooling familiarity. The most common pattern is gRPC internally with REST or GraphQL at the edge, using an API gateway for protocol translation.

How do I debug gRPC calls if the payloads are binary?#

Tools like grpcurl (a curl equivalent for gRPC), Postman (which added gRPC support in 2023), and BloomRPC provide human-readable interaction with gRPC services. For production debugging, structured logging at the gateway layer that decodes Protobuf messages into JSON is the most effective approach. APISIX's logging plugins can capture gRPC request and response metadata for observability.

What is the performance difference between gRPC and REST in practice?#

In controlled benchmarks, gRPC typically delivers significantly higher throughput and lower latency than REST/JSON for equivalent workloads. The gains come from binary serialization (smaller payloads, faster encoding), HTTP/2 multiplexing (fewer connections, no head-of-line blocking), and code-generated clients (no reflection or manual parsing). The exact improvement depends on payload size, call frequency, and network conditions. Organizations migrating from REST to gRPC commonly report meaningful reductions in inter-service latency in production.

Does gRPC work with WebAssembly or edge computing?#

Yes. Protobuf serialization libraries exist for languages targeting WebAssembly, and gRPC-Web enables browser-based Wasm applications to call gRPC backends. For edge computing, gRPC's compact payloads and efficient serialization are advantageous on bandwidth-constrained links. Several CDN providers, including Cloudflare and Fastly, now support gRPC proxying at the edge as of 2025.