Stripe Staff Software Engineer — Rate Limiter at Billions RPS
Take this on a laptop or desktop — not your phone. The live interview needs a full screen and keyboard (including a sketch whiteboard on coding rounds). You can buy now, but start it from a computer.
- Field
- Engineering
- Company
- Stripe
- Role
- Staff Software Engineer
- Duration
- 20 min
- Difficulty
- Hard
- Completions
- New
- Updated
- 2026-05-11
How to prepare
What this round tests, what strong and weak answers sound like, and the traps to sidestep.
What this round is about
- Topic focus. You are designing a distributed rate limiter for the Stripe API at billions of requests per day, with the published Stripe rate-limiter post as the starting point you are expected to have internalised and to think beyond.
- Conversation dynamic. Aditi, a Staff Engineer on API Infrastructure, drives almost nothing. You lead the design. She probes when you skip a step or quote a number without grounding it.
- What gets tested. Requirements disambiguation, algorithm tradeoff articulation, distributed-systems failure-mode coverage, hot-key handling, operational rollout, and recalibration when the interviewer surfaces a new constraint.
- Round format. Twenty minutes, four blocks, candidate-led, no slides. The interviewer will deliberately push back on one correct decision to test conviction.
What strong answers look like
- Quantitative grounding. Every number is anchored, for example: token bucket holds a 16-byte tokens-and-timestamp pair per key, so one billion active keys costs roughly 16 GB of Redis memory per replica.
- Named tradeoff with alternative rejected. 'I am picking token bucket over sliding window log because sliding window log is O(requests) per key in memory, which at billions of keys is prohibitive.'
- Failure mode coverage volunteered, not extracted. Hot keys, Redis cell failure, network partition, replica lag, time skew, GC pauses named without being asked.
- Operational specifics. Shadow mode for two weeks, canary one then five then twenty-five then one hundred percent, per-endpoint 429 rate parity and p99 latency parity as gating signals, single kill switch flips to fail-open globally in under thirty seconds.
What weak answers look like (and how to avoid them)
- Architecture before requirements. Drawing boxes before pinning throughput target, latency budget, fail-open versus fail-closed default. Avoid by stating non-functional requirements first, in writing if on a canvas.
- Algorithm picked without alternative rejected. Picking 'sliding window log' without acknowledging the memory cost reads as not knowing why. Name what you rejected and why.
- Single global Redis with no SPOF acknowledgement. Proposing one Redis cluster without describing what happens during a cell failure. Avoid by stating per-region cells with consistent-hash sharding and explicit fail-mode behaviour.
- Switching answers immediately under pushback. When the interviewer challenges a correct decision, defend with data instead of capitulating. Capitulation reads as low conviction.
Pre-interview checklist (2 minutes before you start)
- Recall the four Stripe limiters. Request rate, concurrent request, fleet usage load shedder, worker utilization load shedder. Know which fails open and which fails closed.
- Have your back-of-envelope numbers ready. One billion API calls per day is roughly 12k average rps, peak likely 50-100k rps per region. Memory per token bucket is around 16 bytes.
- Identify the algorithm you will pick. Token bucket, and the three you will reject (leaky bucket, sliding window log, sliding window counter) with a one-sentence reason each.
- Think of a real hot-key incident you have seen. A specific moment where one customer or key dominated traffic, what broke, what you changed.
- Pull up the Retry-After contract. 429 with Retry-After header, X-RateLimit-Limit / X-RateLimit-Remaining / X-RateLimit-Reset, exponential backoff with jitter on the client.
- Re-read the fail-open versus fail-closed business consequence. Request-rate limiter fails open, concurrent limiter fails closed, and you must be able to defend the business reasoning.
How the AI behaves
- Probes every claim. Asks for the underlying numbers, the rejected alternative, the failure mode. Will not accept the headline architecture.
- No mid-interview praise. Will not say 'great answer' or 'exactly'. Will acknowledge specifically what you said and push deeper.
- Interrupts on abstraction. Pushes for concrete implementation when the answer stays at box-and-arrow level. Asks for Lua-script logic, atomic primitives, observability signals.
- Deliberate pushback once. Will challenge a correct decision once to test conviction. Defend with data. Switching answers reads as low conviction.
Common traps in this type of round
- Algorithm without tradeoff. Picking an algorithm without naming what you rejected and why.
- SPOF unacknowledged. A single global Redis without naming what happens on cell failure.
- Throughput without latency. Quoting rps numbers without naming the latency budget the limiter must hit, typically under 5ms p99.
- Buzzword without justification. Name-dropping consistent hashing, CRDTs, or quorum without explaining when and why they apply.
- Generic rollout. 'Just use a feature flag' without canary percentages, gating signals, kill-switch contract.
- No final summary. Ending without a one-paragraph summary that names the chosen design and the limitations not yet covered.
You will also write code
- Implement the atomic consume. After you defend token bucket and put it on Redis, Aditi opens problem 1 on your canvas and asks you to implement the lazy-refill read-modify-write — the exact logic that has to run inside one Lua script for atomicity.
- What is graded. Fractional-time-accurate refill clamped at capacity, a pure function that reads the clock once and never mutates its input, a correct
retryAfterMson reject, and your argument for why a single Lua script beats a GET-then-SET under a race.
Sample problems you'll face
The problem below is the same one you'll work through in the live session — no surprises. Read the constraints carefully; the AI persona will refer you to the on-canvas card by problem number.
- 1Atomic token-bucket consume (the Lua read-modify-write, in code)
A Redis key stores one API key's bucket as { tokens, lastRefillMs }. Implement consume(bucket, nowMs, cost, capacity, refillPerSec): lazily refill tokens for the time elapsed since lastRefillMs (capped at capacity), then admit the request only if at least `cost` tokens remain. Return { allowed, bucket: nextBucket, retryAfterMs }. This is exactly the read-modify-write that must execute atomically inside one Redis Lua script — so write it as a pure function of its inputs, reading the clock exactly once (nowMs is the only time source) and never mutating the bucket you were handed.
Example inputbucket = { tokens: 0, lastRefillMs: 1700000000000 } nowMs = 1700000000500, cost = 1, capacity = 10, refillPerSec = 4Example output{ allowed: true, bucket: { tokens: 1, lastRefillMs: 1700000000500 }, retryAfterMs: 0 } // 500ms × 4/s = 2 tokens refilled, 1 consumed- Pure function: no I/O, no second clock read — nowMs is the only time source.
- Never mutate the input bucket; return a new bucket object.
- Refill is fractional-time accurate (sub-second elapsed counts) and clamped at capacity (no overflow).
- On reject (allowed=false), leave tokens unchanged and set retryAfterMs to the wait until `cost` tokens are available.
The full breakdown
How you're scored, the questions candidates ask most, and the research this interview is built on. Skim it — or just start the interview.
Interview framework
You will be scored on these 8 dimensions. The full rubric with definitions is below.
What we evaluate
Your final scorecard breaks down across these dimensions. The full rubric and tier criteria are revealed inside the interview itself.
- Requirements Disambiguation Rigor13%
- Algorithm Tradeoff Articulation16%
- Distributed Failure Mode Coverage16%
- Hot Key and Load Skew Handling12%
- Operational Rollout Specificity13%
- Recalibration Under Pushback10%
- API Consumer Contract Design5%
- Implementation Correctness15%
Common questions
Sources this interview is built on
Real candidate-report URLs (Glassdoor / AmbitionBox / PrepInsta / GeeksforGeeks / Medium) reviewed when authoring the questions, persona, and rubric. Verify the realism yourself.
- Scaling your API with rate limiters - Stripe Engineering Blogstripe.com
- Rate Limits | Stripe Documentationdocs.stripe.com
- Design a Distributed Rate Limiter | Hello Interviewhellointerview.com
- Design A Rate Limiter - ByteByteGobytebytego.com
- Stripe Software Engineer Interview Guide (2026) - Exponenttryexponent.com
- Stripe Staff Software Engineer Interview Experience - Glassdoorglassdoor.com