Rate Limiter at Billions RPS round·Engineering·Hard·20 min

Stripe Staff Software Engineer — Rate Limiter at Billions RPS

Start the interview now · ₹9920 min · 1 credit · scorecard at the end

ZeroPitch Research Desk

Updated 2026-05-11 · Newly added · be among the first to take it

Field: Engineering
Company: Stripe
Role: Staff Software Engineer
Duration: 20 min
Difficulty: Hard
Completions: New
Updated: 2026-05-11

What this round is about

Topic focus. You are designing a distributed rate limiter for the Stripe API at billions of requests per day, with the published Stripe rate-limiter post as the starting point you are expected to have internalised and to think beyond.
Conversation dynamic. Aditi, a Staff Engineer on API Infrastructure, drives almost nothing. You lead the design. She probes when you skip a step or quote a number without grounding it.
What gets tested. Requirements disambiguation, algorithm tradeoff articulation, distributed-systems failure-mode coverage, hot-key handling, operational rollout, and recalibration when the interviewer surfaces a new constraint.
Round format. Twenty minutes, four blocks, candidate-led, no slides. The interviewer will deliberately push back on one correct decision to test conviction.

What strong answers look like

Quantitative grounding. Every number is anchored, for example: token bucket holds a 16-byte tokens-and-timestamp pair per key, so one billion active keys costs roughly 16 GB of Redis memory per replica.
Named tradeoff with alternative rejected. 'I am picking token bucket over sliding window log because sliding window log is O(requests) per key in memory, which at billions of keys is prohibitive.'
Failure mode coverage volunteered, not extracted. Hot keys, Redis cell failure, network partition, replica lag, time skew, GC pauses named without being asked.
Operational specifics. Shadow mode for two weeks, canary one then five then twenty-five then one hundred percent, per-endpoint 429 rate parity and p99 latency parity as gating signals, single kill switch flips to fail-open globally in under thirty seconds.

What weak answers look like (and how to avoid them)

Architecture before requirements. Drawing boxes before pinning throughput target, latency budget, fail-open versus fail-closed default. Avoid by stating non-functional requirements first, in writing if on a canvas.
Algorithm picked without alternative rejected. Picking 'sliding window log' without acknowledging the memory cost reads as not knowing why. Name what you rejected and why.
Single global Redis with no SPOF acknowledgement. Proposing one Redis cluster without describing what happens during a cell failure. Avoid by stating per-region cells with consistent-hash sharding and explicit fail-mode behaviour.
Switching answers immediately under pushback. When the interviewer challenges a correct decision, defend with data instead of capitulating. Capitulation reads as low conviction.

Pre-interview checklist (2 minutes before you start)

Recall the four Stripe limiters. Request rate, concurrent request, fleet usage load shedder, worker utilization load shedder. Know which fails open and which fails closed.
Have your back-of-envelope numbers ready. One billion API calls per day is roughly 12k average rps, peak likely 50-100k rps per region. Memory per token bucket is around 16 bytes.
Identify the algorithm you will pick. Token bucket, and the three you will reject (leaky bucket, sliding window log, sliding window counter) with a one-sentence reason each.
Think of a real hot-key incident you have seen. A specific moment where one customer or key dominated traffic, what broke, what you changed.
Pull up the Retry-After contract. 429 with Retry-After header, X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset, exponential backoff with jitter on the client.
Re-read the fail-open versus fail-closed business consequence. Request-rate limiter fails open, concurrent limiter fails closed, and you must be able to defend the business reasoning.

How the AI behaves

Probes every claim. Asks for the underlying numbers, the rejected alternative, the failure mode. Will not accept the headline architecture.
No mid-interview praise. Will not say 'great answer' or 'exactly'. Will acknowledge specifically what you said and push deeper.
Interrupts on abstraction. Pushes for concrete implementation when the answer stays at box-and-arrow level. Asks for Lua-script logic, atomic primitives, observability signals.
Deliberate pushback once. Will challenge a correct decision once to test conviction. Defend with data. Switching answers reads as low conviction.

Common traps in this type of round

Algorithm without tradeoff. Picking an algorithm without naming what you rejected and why.
SPOF unacknowledged. A single global Redis without naming what happens on cell failure.
Throughput without latency. Quoting rps numbers without naming the latency budget the limiter must hit, typically under 5ms p99.
Buzzword without justification. Name-dropping consistent hashing, CRDTs, or quorum without explaining when and why they apply.
Generic rollout. 'Just use a feature flag' without canary percentages, gating signals, kill-switch contract.
No final summary. Ending without a one-paragraph summary that names the chosen design and the limitations not yet covered.

Interview framework

You will be scored on these 7 dimensions. The full rubric with definitions is below.

Requirements Disambiguation

How precisely you pin non-functional requirements (throughput, latency budget, fail-mode default, multi-region scope) before proposing architecture.

15%

Algorithm Choice Tradeoff

How clearly you name the chosen algorithm AND the alternatives rejected with a workload-anchored or memory-anchored reason per rejection.

20%

Distributed Failure Mode Coverage

Whether you enumerate hot keys, Redis cell failure, partition, replica lag, time skew, GC pauses with mitigations, unprompted.

20%

Hot Key Handling

Whether you propose fan-out across sub-buckets when probed on a single API key generating disproportionate traffic, with the precision tradeoff named.

15%

Operational Rollout Specificity

How specific your rollout plan is: shadow duration, canary percentages, gating signals, kill switch activation time.

15%

Recalibration Under Pushback

Whether you defend a correct decision with data when challenged, instead of switching answers immediately.

10%

API Consumer Contract Reasoning

Whether you reason about the developer experience of the API consumer (429 with Retry-After, X-RateLimit headers, exponential backoff with jitter).

What we evaluate

Your final scorecard breaks down across these dimensions. The full rubric and tier criteria are revealed inside the interview itself.

Common questions

What does the Stripe Staff system design rate limiter round actually test?

It tests whether you can design and defend a distributed rate limiter at billions of API calls per day with a sub-5ms p99 latency budget. The interviewer probes algorithm choice (token bucket versus sliding window counter), Redis failure semantics (fail-open versus fail-closed), hot-key handling, multi-region consistency, and operational rollout. The bar is whether they would trust you to be on-call for the system you just designed at 3am during a payments outage.

How should I structure my Staff-level answer in this round?

Lock down requirements before drawing any boxes. Pin the throughput target (1M-10M rps per region), the latency budget (under 5ms p99 added to every API call), the fail-open versus fail-closed default, and the multi-region scope. Then propose one algorithm and one storage layer with a named tradeoff against the alternatives. Only then descend into hot keys, partition behaviour, and rollout. Close with a one-paragraph summary naming the limitations you have not yet addressed.

What are common failure modes when designing a distributed rate limiter at Stripe scale?

The recurring rejection patterns are: jumping to architecture before pinning non-functional requirements, picking sliding window log without acknowledging the memory cost, proposing a single global Redis without naming the SPOF risk, ignoring hot-key fan-out, quoting throughput numbers without a latency budget, switching answers immediately under pushback, and staying at box-and-arrow level without operational rollout, dashboards, runbooks, or kill switches.

How do I handle hot keys when one API key generates 1M rps from a botnet?

Acknowledge the hot key explicitly, do not assume uniform load. The Stripe-style response is to fan the single bucket out into N sub-buckets, route requests across them with a secondary hash, and aggregate counts on read. This sacrifices precision (each sub-bucket only sees its slice) for throughput, which is the right tradeoff because the over-permit on a misbehaving key is bounded and small. Pair it with a circuit breaker for keys above a threshold.

Why does Stripe use token bucket instead of sliding window log or leaky bucket?

Token bucket naturally accommodates bursts, and real API workloads are bursty. A Stripe checkout fires several quick API calls in a 500ms window. Leaky bucket throttles those bursts and degrades UX. Sliding window log is the most precise but its memory cost is O(requests) per key, which is prohibitive at billions of keys. Token bucket holds one tokens-and-timestamp pair per key in Redis and the read-modify-write fits cleanly in a Lua script for atomicity.

Should the rate limiter fail open or fail closed if Redis goes down?

Stripe published this in its rate-limiter blog post: fail-open by default for the request-rate limiter, fail-closed for the concurrent limiter. The reasoning is that the request-rate limiter exists to prevent abuse, and a brief Redis outage that lets through a few extra requests is far better than killing all API traffic. The concurrent limiter protects downstream worker pools from saturation, so failing closed there is the safer default. Name the business consequence of being wrong before picking.

How is multi-region rate limiting handled at internet scale?

Each region runs its own Redis cell and enforces rate limits locally. Cross-region reconciliation is asynchronous and eventual, not synchronous. Synchronous cross-region quorum would add 100ms or more to every API call and break the latency SLA on /v1/charges. The accepted tradeoff is that a customer can briefly exceed a global limit by a small factor while their traffic is split across regions. State this AP-over-CP stance explicitly with the business reasoning.

How is the AI interviewer different from a real Stripe interviewer?

The AI mimics a Stripe Staff Engineer on API Infrastructure. It will not praise your answers or confirm if you are right. It will listen for the specific reasoning beats (requirements first, algorithm tradeoff, distributed failure modes, operational rollout, recalibration under pushback) and probe the ones you skip. It will deliberately push back on a correct decision once to test your conviction. Switching answers immediately reads as low conviction.

How is scoring done for this mock interview?

Scoring evaluates six dimensions: requirements disambiguation, algorithm and data-structure tradeoff, distributed failure-mode coverage, hot-key and operational rollout, recalibration under pushback, and customer-facing API design (Retry-After, X-RateLimit headers, exponential backoff with jitter). You are graded on the specificity of your evidence and the quantitative grounding of your back-of-envelope math, not on diagram aesthetics or buzzword count.

What should I do in the first 2 minutes of this round?

Pin functional requirements (what is being rate-limited: API key, IP, customer ID, endpoint) and at least three non-functional requirements (throughput target per region, latency budget the limiter must hit, fail-open versus fail-closed default). State the multi-region scope. Ask the interviewer one clarifying question on consistency expectations. Only then sketch a candidate architecture. Do not draw any boxes before this is on the page.

What does a strong answer sound like at the Staff level?

A strong answer names the specific tradeoff being made and the alternative being rejected, with a concrete reason. For example: 'I am picking token bucket over sliding window log because sliding window log is O(requests) per key in memory, which at one billion active keys and a 60-second window costs us terabytes of Redis memory, while token bucket is a 16-byte pair per key.' The pattern is: choice, alternative rejected, quantitative reason, business consequence.