Hacker News new | ask | show | jobs
by logicallee 3361 days ago
What do you mean by "tRP + tRAS"?

I now understand how it's reasonable, as in, correct. But I don't understand the fundamental reason for this. Okay, so every time a row is read, if it's not in cache it'll get cached. But why does it have to be that way?

Couldn't there be a mode, "hey don't fully open these rows, I just one want one random byte as fast as possible!"

I compared it with spinning disks just to show how unreasonable the total is. I realize that the whole design isn't built around this idea of picking off a byte at a time.

But don't you think there could be applications that have PRECISELY, exactly this usage pattern?

For example, what percent of your neurons are firing at the moment? Very, very low.

For some future applications, getitng a 10x speedup in random memory reads of single bytes might totally increase that application by a lot. Even if desktops aren't built this way today, I'm super-surprised that when the whole system isn't doing anything else, there is no way to get that kind of raw access without asking for whole rows at a time.

2 comments

> Couldn't there be a mode, "hey don't fully open these rows, I just one want one random byte as fast as possible!"

As fast as possible is exactly tRP+tRAS. Since the whole row is read in parallel to RAM's internal SRAM buffer, opening only part of it would make no difference.

> What do you mean by "tRP + tRAS"?

Ever heard of RAM timings? I'm afraid at some point you will have to read how DRAM works to understand more. There was a link in my last post.

It's this way because in the 80s/90s computer architects simulated different kinds of CPU/memory system designs running existing C programs and measured that it's best to focus on caching and compromise on main memory random access. Then CPU vendors made such systems and they outsold/performed cacheless systems. And after that memory module standardization kept the direction, because memory cost per byte was more in demand than random access performance.

Yes, there are of course workloads that don't like that. But programs adapt to hardware over time too, so co-evolution has weeded out these access patterns from high-performance programs that can be structured differently.

You could make a computer that uses DRAM differently, but it would be expensive because you couldn't use mass market memory modules.

(Exception: some CPUs use in-package fast DRAM as last level cache).

There have been some custom hardware supercomputer designs (Tera MTA line) that were optimized for cache hostile workloads.