Hacker News new | ask | show | jobs
by dcsommer 1031 days ago
What are the language/tooling gaps specifically that prevent this today, and have there been RFCs to close them? Are the gaps primarily "in-language" or missing tooling for formal verification?
2 comments

Probably SIMD support and constant time support. Crypto libraries tend to use SIMD a lot to be fast.

You can write constant time code in Rust by carefully making sure your code only compiles to constant time instructions without branches, but you'd really want some kind of annotation on the code to enforce that.

That's mostly a guess though.

Rust has simd support, so maybe the latter is the issue.
not in stable yet
This is incorrect. Rust has had x86_64 (up to and including AVX2) intrinsics stable since Rust 1.26. Wasm32 simd128 and aarch64 neon intrinsics are also stable.
core::arch::x86_64::_mm256_shuffle_epi8 and such seem to be in stable.
AFAIK intrinsics are stable. Idk about auto-vectorizers and the like.
other commenters, call me when this issue is closed.

https://github.com/rust-lang/rust/issues/48556

isnt constant time orthogonal to whether there are branches?
No. Constant wall clock time involves not using branches. Maybe you're thinking of "asymptotic constant time" or "runtime bounded above by a constant." These are not what is needed, because what is needed is to not expose any information via timing.
What if there are branches but both paths result in the same number of cycles being required to execute the instructions?

Is it correct to say: "all branchless code runs in constant time, but not all constant time code is branchless"?

The subtlety is that eliminating branching isn't sufficient to have constant time code. A simple example is using trigonometric and transcendental opcodes. They don't branch (at the assembly level), but on x86 take variable amounts of time depending on the input operand. Very few algorithms actually use these opcodes though, so a more relevant concern is memory access due to variable latency. Even if you have that nailed down, integer operations like multiplication and especially division can take variable amounts of time depending on the input.

Writing truly constant time code on modern processors ranges is difficult at best, and usually less efficient than variable-time code.

Really interesting information, thank you.
Add cache misses, bus interference, SMT woes and it quickly becomes harder and harder to write (and check) constant-time (or WCET) properties. Even modern micro-benchmarks are a huge labyrinth of architectural traps.
Because of speculative execution, branchy code with equal-runtime branches will still take different amounts of time if it is called repeatedly, usually in ways that reveal information about the input.
Timing attacks are common everywhere, by the way. Simplest example, perhaps a bit too contrived:

I'm an attacker doing targeted research. I want to see if a multi-auth system has an association between two email addresses tied to the same account.

Pulling a database record or in-memory record (e.g. via LFU/LRU cache) in some cases may cache the account record, which means a subsequent record might be warm when fetched with the second email.

I run a time analysis against the endpoint with garbage addresses, known addresses (that I've set up) and the two target addresses to check subsequent fetch speeds.

In some cases, this will cause enough of a time difference to tell me if there's a connection.

Timing attacks are hard, and even a well-architected system can expose information indirectly. Encryption is a bit one if the inputs are static (e.g. keys or the like) and are a common way to target endpoints.

The issue is speculative execution. Whenever there is a branch, the CPU makes a guess. If it guesses wrong, it has to go back to the correct path which introduces a delay. So any branching code has the possibility of revealing information through the branch predictor.
> all branchless code runs in constant time

No - e.g. division is not constant time.

You have to have branchless code and only use certain instructions.

E.g. here is the list for RISC-V.

https://github.com/rvkrypto/riscv-zkt-list/blob/main/zkt-lis...

Most things except div/rem, branches and floating point are ok. Oh and obviously store/load.

Thanks, this is really interesting.
Because branch prediction exists: sometimes yes, often no. Among other reasons.
Deterministic builds, and inability to ensure constant-time operations are the two that come to mind. The first is a build security / supply chain issue, and the latter is a real vulnerability if the rust compiler "helpfully" optimizes away no-op operations in alternate code paths.
Whenever I'm wearing my tinfoil hat, I wonder if all the advice to never implement your own crypto is a conspiracy to reduce independent implementations of cryptography algorithms.

I know constant time operation is important for these algorithms, but couldn't I do this with a timer? Call the algorithm, store the result, return the result exactly one second (an eternity in CPU time) after it was called. Basically put a timer wrapper around the actual cryptography algorithm. It would harm latency, but not throughput.

This is a honest question I'm hoping to have answered.

"Constant time" algorithms isn't really about the time it takes. It's more important that they exhibit no observable side effects of a branch. This can be power usage, memory usage as well as time.

For instance, a multiply might take slightly more power than an add instruction and that can be monitored.

If you think these attacks are unreasonable, recently there was a post on HN about using the LED of a smart card reader to detect the fluctuations in power usage to gain information about the secret key. These attacks are real

People keep finding serious architectural leaks in CPUs like Spectre/Meltdown, which makes me question whether any constant-time implementation can really be without observable side effects.
Some CPUs do have non-constant-time multiplies.
And one protection against that is to map the key into another space using a random (or close enough) key for that transformation, perform the calculation homomorphically, then transform back.

This is often too expensive, but it does come up as a possibility in some zero knowledge protocols.

No. Timing attacks look at statistical distributions given a set of inputs. Adding a flat 1 to all the times does not flatten the distributions. Likewise, adding a random jitter has the same problem, where it increases the variance but doesn't remove it - you need more samples to get the same confidence interval, which is very different from preventing timing attacks entirely.
They’re not talking (I don’t think) about adding a flat 1, they’re talking about something like this pseudocode I believe:

   timer = SomeScaffoldThatTimesAFunctionAccurately()
   result = timer(Do Crypto Thing())
   wait(1-timer.elapsed)
   return result
   
So the idea in this scenario is that every operation would take 1 second.

The first obvious drawback of this from a practical point of view is speed. You need crypto operations to be extremely fast so although you could reduce the constant time somewhat it would still be very difficult to calibrate the timing in such a way as not to totally nerf performance. The delay you’re introducing here is required (as I understand it) on every branch and possibly other places so there are thousands of these required in order to do each user-visible crypto operation.

Secondly if you think about the way the attack works, the jitter the attackers are observing to obtain the leaked information is very small so you would need an extremely accurate timer in order to be precise enough not to still leak the jitter.

As I understand it the actual defense against timing attacks is not that different from the above, just that the timing is precomputed and then the delays introduced to be exactly correct. My understanding is it’s generally something like

    If some_conditional:
        Do some operation that takes 2 cycles
        Delay for precisely 1 cycle
    Else:
        Do some operation that takes 3 cycles

    …
    Continue
For example, say you are testing a password. You could do

    Def brokenPasswordChecker(password, attempt)
       For i in len(attempt):
          If attempt[i] != password[i]
             Return false
       Return(true)
But if you do it that way, then attackers could observe that you return immediately after an incorrect character and use that timing to gradually deduce the prefix of the password until they have the whole thing. So instead you do:

    Def maybeLessBadPasswordChecker(password, attempt)
       Bool result = True
       For i in len(attempt):
          If attempt[i] != password[i]
             result =  False
       Return(result)
…which does the exact same number of comparisons every time so this same attack doesn’t work.

This is why you need help from the language/compiler or to break into inline assembly - lots of compiler optimisations will “helpfully” elide the delays you are introducing specifically to make the branches take the same time so they are once again vulnerable to timing attacks. So you need something so you can force that not to happen.

All that will do is slow an attacker down by a very small amount, it doesn't change the basic nature of the attack at all, if it worked before it will still work, just slightly slower. It's like adding a little bit of noise to a signal.
It depends. If your threat model involved an attacker being able to monitor your power supply (as in some kind of embedded system), they’d be able to see the real work done and separate that from the fake delay.
Except if your 'wait' operation is doing the same computation (add, multiply...) but won't use the result. I often do that in some real-time/low-latency things where you want things constant-time (or can't easily define the worst-case execution time, just make all paths the worst). Then you still need to blind the speculative execution mechanisms so that they don't 'see' you're not using the result.