Building a thread-safe streaming percentile engine for high-frequency trading
This is a hard coding problem that combines two distinct challenges: implementing a memory-efficient approximate quantile sketch, then making it safe for concurrent producer-consumer access under extreme scale. It is representative of the systems-level work that quant trading desks demand, where both algorithmic correctness and concurrency discipline matter equally.
Part 1 tests your ability to trade off accuracy for space: you must implement a streaming percentile estimator (such as Greenwald-Khanna) that maintains a sub-linear summary of up to 108 observations while answering percentile queries in sub-linear time. Part 2 escalates this to a multi-threaded setting, requiring you to protect the sketch with a readers-writer lock that allows concurrent queries but serializes ingestion, all without races or deadlocks.
Strong solutions demonstrate fluency with the Greenwald-Khanna algorithm or an equivalent sketch, careful state management in a concurrent context, and clear reasoning about linearizability: a query must reflect some consistent snapshot of ingested values taken before it began. Interviewers probe whether you understand the difference between a global mutex and a readers-writer lock, and whether you can reason about memory and time complexity under contention.
- Streaming quantile summaries and rank-error guarantees
- Readers-writer locks and serialization vs. concurrency
- Linearizability and snapshot consistency under concurrent updates
- Space and time complexity trade-offs at scale