| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TheLoneWolfling 4276 days ago

On a related note: I wish that more CPUs had an explicit cache. So data has to be explicitly loaded into cache, etc.

Modern CPUs are NUMA. Don't treat memory as RAM any more, because that's not true.

Biggest problem with this is that not all CPUs have the same amount of cache. But you can get around this by treating the cache as the low area of RAM, with instructions to get the amount of cache available. Especially if cache is also paged.

Other issue with this is context switches, but this is conceptually no different than paging RAM to disk when required.

2 comments

pdq 4276 days ago

I'm not too familiar with many other architectures, but MIPS has dcache "fill", "flush", and "lock" operations, so the user can both do a premature fill operation, or even lock data in the cache so it won't be evicted.

I haven't seen many people actually use these ops, because it's actually pretty hard to do better than the built-in cache allocation policies for most applications, especially if you take into account that your app is going to get swapped out consistently by the operating system task switches.

link

TheLoneWolfling 4276 days ago

There is a distinction between allowing said operations and being designed for such operations. It is possible even in x86, although difficult, and requires privileged operations. (A user-mode program can request that something be prefetched or flushed, and can do non-temporal loads (and stores?), but in order to get "true" scratchpad memory you have to play with the MTRR, and even then the processor doesn't support hardware paging of cache, like it does with, for example, RAM)

> I haven't seen many people actually use these ops, because it's actually pretty hard to do better than the built-in cache allocation policies for most applications, especially if you take into account that your app is going to get swapped out consistently by the operating system task switches.

And again, this is largely because the cache is implicit to the OS. There's no way to go "this is the stuff that was cached last time this process has control, when you can, reload it back in" to the processor, because you can't tell what in cache is "owned" by what in anything like an efficient manner - and even if you could, the moment you start executing a context switch you've overwritten random bits of cache.

It's like if the processor was set up to directly talk to the hard drive to do paging on demand, to the point that the OS wasn't even aware of it. In theory it's a good idea, but the more you look at it the more flaws emerge.

link

kps 4276 days ago

For anyone interested in experimenting along these lines, the Intel Quark (486-based SoC) has on-die plain SRAM instead of L2 cache.

link

mzs 4276 days ago

Cyrix had this too, probably first, you used those fake outb addresses that the CPU noticed to set up how many cachelines to lock.

link

mzs 4276 days ago

If your PPC is using a Discovery PHC you can map half or all of your L2 to a block of PAs and then map it where ever you want with VM. I'm sure this was cause of the experience that Genesis had with MIPS. It's a nifty feature.

link

sklogic 4276 days ago

Sparc had an explicitly available scratch memory. Pity there is nothing comparable in the more mainstream architectures.

link