The RAM myth
The RAM myth is a belief that modern computer memory resembles perfect random-access memory. Cache is seen as an optimization for small data: if it fits in L2, it’s going to be processed faster; if it doesn’t, there’s nothing we can do. Most likely, you believe that code like this is the fastest way to shard data (I’m using Python as pseudocode; pretend I used your favorite low-level language): groups = [[] for _ in range(n_groups)] for element in elements: groups[element.group].append(element) Indeed, it’s linear (i.e. asymptotically optimal), and we have to access random indices anyway, so cache isn’t going to help us in any case. In reality, when the number of groups is high, this is leaving a lot of performance on the table, and certain asymptotically slower algorithms can perform sharding significantly faster. They are mostly used by on-disk databases, but, surprisingly, they are useful even for in-RAM data.
https://purplesyringa.moe/blog/the-ram-myth/