Qminers Masterclass vol. 1: Low-latency systémy

Feb 23, 2026

Qminers is an algorithmic trading company. We develop our own high-frequency trading systems that trade on financial markets in real time. We don't sell software to clients. We build and operate our own infrastructure, our own strategies, and our own technology stack. For us, performance isn't a cosmetic feature. It's a fundamental parameter of the product.

In high-frequency trading, nanoseconds decide. Not figuratively, but literally. Speed can mean the difference between an executed trade and a missed opportunity. And that's exactly why we use C++ in the key parts of our system.

Python has a firm place in our world too. We use it for research, analytics, and prototyping new strategies. It's fast to develop in, flexible, and productive. But once speed becomes a direct part of the final product, we need full control over what happens between the code and the hardware. And that's what C++ gives us.

C++ lets us influence what the CPU actually executes. It gives us control over memory, over data structures, over how the cache is used and how branch prediction behaves. In an environment where latency is critical, it's no longer just about the algorithm. It's about how that algorithm physically translates into instructions and how those instructions behave on a specific processor.

Code on paper versus hardware reality

Code can look perfect on paper. Clean, with good asymptotic properties, with a clear structure. But that alone guarantees nothing. Performance isn't about syntax or how many lines a function has. What matters is what happens between the source code and the processor.

To truly trust our code, we have to understand how it behaves in memory, how it uses the cache, and how the branches in conditions align with what the branch predictor expects. That doesn't mean optimizing everything from the very start, though.

There's a well-known saying that premature optimization is the root of all evil. Optimizing too early leads to less readable code and often doesn't even bring any real speedup. At Qminers, we take the opposite approach. First we measure. We identify the actual bottleneck, and only then do we look for a way to remove it.

Performance problems are treacherous because they depend on specific data and specific hardware. What works in a test or a theoretical model can behave completely differently in production. The only way to find the truth is through measurement.

When latency shoots up

Matěj describes a situation when we launched trading on a new market with a fine price grid and high volatility. The order book implementation was built on a hash table that was well tuned for denser books.

On the new market, however, long collision chains started to form. On paper, everything looked correct. The instruction counts added up. Yet latency suddenly shot up. Every lookup meant jumping around in memory, repeated cache misses, and unpredictable delays.

In high-frequency trading, this kind of behavior is critical. Unstable latency is a problem in itself.

We didn't find the cause by guessing. We measured it. Tools like perf and flamegraph revealed time hotspots even where callgrind indicated no fundamental problem. It was precisely the difference between these two views that suggested the problem wasn't in computational complexity but in memory access.

How stable performance is born

The solution wasn't a radical rewrite of the entire system, but targeted adjustments. We needed to properly size the table for the specific market, choose more cache-friendly data structures, such as flat_map or vector, and simplify the control flow so the branch predictor wouldn't have to guess unnecessarily.

The result was a transformation of unpredictable slowdowns into consistently fast, low-latency responses. It wasn't just about speeding up the average, but about eliminating the spikes.

Three general principles follow from this experience. Measure the actual behavior of your program and don't rely on theory alone. Design your system with data locality in mind. And aim for predictable branches and contiguous memory access. That's where real performance comes from.