Hacker News new | ask | show | jobs
by zozbot234 65 days ago
> For a given capacity of memory, Flash uses far less power than DRAM, especially when used mostly for reads.

Flash has no idle power being non-volatile (whereas DRAM has refresh) but active power for reading a constantly-sized block is significantly larger for Flash. You can still use Flash profitably, but only for rather sparse and/or low-intensity reads. That probably fits things like MoE layers if the MoE is sparse enough.

Also, you can't really use flash memory (especially soldered-in HBF) for ephemeral data like the KV context for a single inference, it wears out way too quickly.

1 comments

Modern flash memory, with multi-bit cells, indeed requires more power for reading than DRAM, for the same amount of data.

However, for old-style 1-bit per cell flash memory I do not see any reason for differences in power consumption for reading.

Different array designs and sense amplifier designs and CMOS fabrication processes can result in different power consumptions, but similar techniques can be applied to both kinds of memories for reducing the power consumption.

Of course, storing only 1 bit per cell instead of 3 or 4 reduces a lot the density and cost advantages of flash memory, but what remains may still be enough for what inference needs.

The basic physics of reading from Flash vs. DRAM are broadly similar, and it's true that reading from SLC flash is a bit cheaper, but you'll still need way higher voltages and reading times to read from flash compared to DRAM. It's not really the same.