|
This formatting is more intuitive to me. L1 cache reference 2,000,000,000 ops/sec
L2 cache reference 333,333,333 ops/sec
Branch mispredict 200,000,000 ops/sec
Mutex lock/unlock (uncontended) 66,666,667 ops/sec
Main memory reference 20,000,000 ops/sec
Compress 1K bytes with Snappy 1,000,000 ops/sec
Read 4KB from SSD 50,000 ops/sec
Round trip within same datacenter 20,000 ops/sec
Read 1MB sequentially from memory 15,625 ops/sec
Read 1MB over 100 Gbps network 10,000 ops/sec
Read 1MB from SSD 1,000 ops/sec
Disk seek 200 ops/sec
Read 1MB sequentially from disk 100 ops/sec
Send packet CA->Netherlands->CA 7 ops/sec
|
If the reciprocal numbers are more intuitive for you you can still say an L1 cache reference takes 1/2,000,000,000 sec. It's "ops/sec" that makes it look like it's a throughput.
An interesting thing about the latency numbers is they mostly don't vary with scale, whereas something like the total throughput with your SSD or the Internet depends on the size of your storage or network setups, respectively. And aggregate CPU throughput varies with core count, for example.
I do think it's still interesting to think about throughputs (and other things like capacities) of a "reference deployment": that can affect architectural things like "can I do this in RAM?", "can I do this on one box?", "what optimizations do I need to fix potential bottlenecks in XYZ?", "is resource X or Y scarcer?" and so on. That was kind of done in "The Datacenter as a Computer" (https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-comput... and https://books.google.com/books?id=Td51DwAAQBAJ&pg=PA72#v=one... ) with a machine, rack, and cluster as the units. That diagram is about the storage hierarchy and doesn't mention compute, and a lot has improved since 2018, but an expanded table like that is still seems like an interesting tool for engineering a system.