| HN Mirror

>Sorry, but you are simply wrong there. The TLB is just one of many hacks made necessary by the ever-deeper page-table-tree. > >What the R1000 does is collapse the obj->phys lookup in the DRAM memory cycle, and if we did that today, we wouldn't need any page-tables to begin with, much less TLBs.

You would need a TLB even with a completely flat page table because hitting the DRAM bus (some flavor of DDR on modern systems, but it's still fundamentally DRAM) on every access would absolutely destroy performance on a modern machine even if translation itself was "free". You need translation structures that can keep up with the various on-chip cache levels which means they need to be small and hierarchical. You can't have some huge flat translation structure like you have on the R1000 and have it be fast.

Anyway, my point is that at a mechanical level TLB and tag RAM work the same way. You take a large virtual address, hash the upper bits and use them to do a lookup in a set-associative memory (so basically a hash table with a fixed number of buckets for conflicts). In some CPUs (it's a little unclear to me how common it is for cache to be virtually or physically addressed these days) this even happens in parallel with data fetch from cache just like tag RAM lookup on the R1000 was done in parallel with data fetch from DRAM. This is not some forgotten technique, it's just moved inside the CPU die and various speed and die space constraints keep it from covering all the physical pages of a modern system.

Now, could you perhaps use a more R1000-like approach for the final layer of translation, sure. Integrating it tightly with system memory probably doesn't make sense given the need to be able to map other things like VRAM into a virtual address space, but you could have a flat hashtable like arrangement even if it's just a structure in main RAM. You can even implement such a thing on an existing CPU with a software managed TLB (MIPS, some Sparc)