Hacker News new | ask | show | jobs
by yifanlu 2600 days ago
The article is garbage so I skimmed the paper. Here's what I gleaned (apologizes for any mistakes).

So they created a modified RISC-V architecture and added a tagging infrastructure. 64-bit registers become 66-bits. L1/L2 cache are expanded to hold 2 extra bits per qword, etc. In DRAM, every process gets a chunk of memory where all the tags are stored in one area. So for example, if your virtual address space is 32-bits then you have 2^32/64 * 2 bits of storage or ~16MiB of storage.

That's the storage overhead. Now, their architecture also does a sort of taint tracking. When you compile your C code with their modified llvm compiler, it outputs a 2-bit tag for every pointer. So every qword is tagged with "code pointer" "data pointer" or "data" or "code". When the processor operates on a qword, it propagates the tag. For example "pointer + data = pointer" and "pointer & data = data" for example. At any point in time, that tag storage in DRAM will also store if everything in memory is a code/data pointer or code or data.

Periodically, it will traverse through the tag storage and for every code pointer and data pointer it finds, it will obfuscate it (I think it can relocate stuff too). For code/data it can encrypt it. Of course because all this stuff is transparent to the program, there's no extra work for the developer. (E.g. if you load data from a pointer, it tracks that it's a piece of data and decrypts it transparently with a key that can be changed during the "churn"). It's very similar to GC.

They also detail some optimizations such as accounting for context switches during the "churn" process. And how to not have to keep the process halted while DRAM is being churned. They claim the performance impact isn't too bad but of course we'll have to see how it works with something like Chrome.

tl;dr: Basically from what I gathered, it's an architecture extension to RISC-V (it can be introduced to other archs as well) which tracks basic type information for all memory locations. Periodically, the system will transparently shuffle code and data around safely by using the type information. It's harder to exploit vulnerabilities because addresses and data keep changing around.

1 comments

So it sounds like it's CHERI-style, only "randomizing" things over time?

(no paper access for me)

e.g. let's say you went whole hog (E.g. 128bit CHERI 64bit pointer + 64bit metadata) and periodically traverses the heap and swizzles/reencrypts/whatever the meta data? Seems to assume that it can traverse the heap, but I'll ignore that for now