Design choice:
Which rate-limiting algorithm would you choose (token bucket, leaky bucket, fixed window, sliding window, etc.) and why?
Consistency vs availability:
During a network partition, would you rather:
Strictly enforce the limit (never exceed 100), or
Allow slight overages to keep availability high?
Explain your tradeoff using CAP theorem.
Architecture:
Sketch (in words) the system architecture:
Where is the rate-limit state stored?
How do regions coordinate (if at all)?
What happens on each request?
Edge case:
How do you prevent a single user from bypassing limits by sending requests to multiple regions simultaneously?
Failure handling:
What happens if Redis (or your state store) goes down in one region?