Hi Joseph! Go Illini! I didn't see you my last semester but I'm glad to see Chris's members doing well in the world. Also always love Mengjia's work.
2 questions.
1) it's relatively known that PAC is brute-forcable given its relatively small key space (16 bits, sometimes 8 if TBI is enabled). How does your attack differ from general brute forces? (My impression is just your leveraging of the BTB/iTLB is a bit more stealthy.) Similarly, in your opinion, would a fix be more ISA-level or you think it's more specific to the M1 (given brute forcing in general is a PtrAuth problem)?
2) you mention in section 8 that this took 3 minutes for a 16b key and tons of syscalls. Wouldn't another proper mitigation be to limit the number of signatures per key? 3 minutes is definitely a long time, and some form of temporal separation may be quite helpful.
1) Our attack does apply a brute force technique with the twist that crashes are suppressed via speculative execution. If you tried to brute force a PAC against the kernel, you'd instantly panic your device and have to reboot.
2) Given that we never sign anything (only try to verify a signed pointer), and that every authentication attempt happens under speculation, I'm not sure how you would rate limit this without absolutely destroying performance. Keep in mind the kernel is doing a whole lot more with PAC than just our attack (for example, every function's return address is also signed with PAC) so distinguishing valid uses from a PACMAN attack might be challenging.
I suppose you could track how many speculative PAC exceptions you got, but it's a little late to add that now isn't it? And it could also raise lots of false positives due to type confusion style mechanisms on valid mispredicted paths.
Third Q-- What's your opinion on BTI as a possible mitigation? Given it's an v8.5 feature meant for JOPs, and this attack is essentially a speculative JOP, maybe we could use BTI to mitigate and heavily reduce the number of gadgets, speculative or not.
Would it be possible instead to mitigate this by removing the side-channel: either don't leave any trace in the TLB of the speculative execution, or deny access to the TLB for user mode software?
Unwinding changes to the TLB on every mispredict would have a significant overhead and hurt overall performance. Removing valid data you just cached (speculatively or otherwise) is generally a bad idea.
User mode software requires a TLB (unless you want to do a page walk for every single instruction!)
Even if you could remove the TLB entirely from the CPU somehow, the attacker could just use the cache or some other microarchitectural structure.
A colleague pointed out that FPAC[1] in ARMV8.6-A likely prevents this attack, is that right?
I haven't fully digested the paper, but the gadgets seem to rely on AUT, and "Implementations with FPAC generate an exception on an AUT* instruction where the PAC is incorrect"
You can think of it a lot like that! PAC is more advanced as you can describe what a pointer "should" do on access (aka is this a data or code pointer?).
This is a great question! What this means is that a software patch cannot fix the speculative execution behavior that causes the PACMAN issue since it is built directly into how the hardware operates.
You could maybe do it with lots of fences or just a ridiculous chain of NOPs after each branch such that the ROB is cleared before you have time to try to load a pointer speculatively.
In practice, both of these would probably kill performance, so I don't think either of these are great solutions. Recall we are targeting the kernel where everything needs to be as fast as possible.
This gets into the turing completeness tarpit. Yes, it's possible to make a vulnerable implementation emulate a chip that is not vulnerable. And maybe even detect when you don't need to emulate and run natively some of the time.
They probably won't care about this, although I do find it weird when researchers make a whole website with custom domain just to publish something like this. Personally, it comes off as less trustworthy since it enters the same realm of bullshit as those market manipulation attacks on AMD a few years back[1]
Not saying that's what this is (I'm sure these are legitimate findings), but this tactic raises some red flags for me.
Yeah I hate this trend of naming vulnerabilities and pandering to the tech press. The CTS Labs FUD was just beyond the pale. Most tech journalism just ate up those claims that were clearly B.S. and not even self consistent. They were claiming it was impossible for AMD to patch with firmware or microcode but in the same sentence claiming an attacker could use it to create a rootkit that couldn't be removed. Nobody bothered taking two seconds to think critically about what they were publishing to realize they were claiming that it was, in essence, somehow possible for an attacker to "pull up the ladder behind them" but not for AMD.
Maybe this "unpatchable flaw" with the M1 has some more legitimacy than the "critical AMD vulnerabilities" back in 2018, but please, stop with the stupid trendy names for vulnerabilities. Lets discuss this on the technical merits and skip the marketing.
Actively marketing yourself and your ideas is one of the most important things you can do. Without, most people simply won’t know about it or will dismiss it. Just because you market it, doesn’t mean it’ll be successful - things still have to prove their worth regardless and will otherwise fizzle out.
How many important security vulnerabilities have just had technical white papers and no marketing have gotten wider coverage? Very, very few. It’s also very useful for humans to talk about something when given a short, memorable name.
If they are - Joseph and MIT, please stand up to them. The standard for infringement is confusing similarity. Researchers aren't marketing goods and there's no risk of confusion.
2 questions.
1) it's relatively known that PAC is brute-forcable given its relatively small key space (16 bits, sometimes 8 if TBI is enabled). How does your attack differ from general brute forces? (My impression is just your leveraging of the BTB/iTLB is a bit more stealthy.) Similarly, in your opinion, would a fix be more ISA-level or you think it's more specific to the M1 (given brute forcing in general is a PtrAuth problem)?
2) you mention in section 8 that this took 3 minutes for a 16b key and tons of syscalls. Wouldn't another proper mitigation be to limit the number of signatures per key? 3 minutes is definitely a long time, and some form of temporal separation may be quite helpful.