Hacker News new | ask | show | jobs
by waffenklang 4662 days ago
I agree in the point that its the task of the fv-system to take care of this, and so its a flaw in it, if its necessary.

imho,in this case it is a flaw, as the 'public' api function base64_encode wants the user to input pointer to 2 ptr to buffer with only for 1 given the length. and the code makes implicit assumption of the user input without checking it (encoded is manipulated).

1 comments

It's a flaw in the specification of the theorem, not in the verification.

The way it works is that they choose fixed sizes INLEN and OUTLEN, and prove that when data points to a buffer of INLEN bytes, they can allocate a result buffer and base64_encode(data, INLEN, result) will compute the correct encoding.

So, implied in this statement is the fact that the input buffer is valid (thus not null), and that the output buffer must be valid (and not null) and disjoint from the input buffer.

There are more fundamental classes of bugs that this won't detect:

- behavior for arbitrary sizes, such as integer overflows for large sizes

- bugs when using particular functions of the API, such as base64_encode_update()

The latter problem means that there is no proof that b64enc.c is correct at all, even for a fixed size: it's entirely possible that multiple calls to base64_encode_update() would corrupt the internal state.

> - behavior for arbitrary sizes, such as integer overflows for large sizes

Actually, I don't see how they could have a bug of this nature unless the caller actually passed in invalid data. In the context in question, they have 64-bit pointers, and they are using unsigned 64-bit values (which conveniently have defined overflow semantics) for the lengths of the input buffer.

Even if the calculation for data_end overflows, that would mean one of two things: the caller improperly specified the buffer, or the memory model really does have a buffer whose addressing overflows. In the latter case, because they use an equality test and they have no check for magical address values (like say, NULL) of currbyte, I don't see how that'd be a bug in the implementation.

> - bugs when using particular functions of the API, such as base64_encode_update()

You mean bugs when using specific parts of the API in ways that are not consistent with their specification. It's hard to see why you'd even consider that a bug in the implementation. That's a bug in the caller.

> The latter problem means that there is no proof that b64enc.c is correct at all, even for a fixed size: it's entirely possible that multiple calls to base64_encode_update() would corrupt the internal state.

Are you sure about that? First, there doesn't appear to be much that would qualify as "internal state" to base64_encode_update(). The closest thing to internal state it has is base64_encodestate, which really only captures which of three states the state machine is in and a result code, both of which are provably in the correct state at the end of each invocation of the function.

I'm not clear as to why you think the proof doesn't cover base64_encode_update() being invoked multiple times. It's hard to know without understanding what their simulator does, but it sure looks and sounds like it is expanding the full set of possible execution paths through the b64enc, which would include up to N invocations of base64_encode_update(). The only way I can imagine it wouldn't be proving correctness for multiple base64_encode_update() calls is if the simulator always had fread() return ball all requested bytes AND you never proved it for cases where N > BUFSIZE (which would just be silly).

> they are using unsigned 64-bit values (which conveniently have defined overflow semantics) for the lengths of the input buffer.

Yes, if you examine the code you can manually prove that it's safe: but that's not proven by their formal computer-checked proof.

As for the rest of your post: you seem to be assuming that what they proved is that b64enc.c behaves correctly. This is not true at all: they just proved that the helper function base64_encode() works correctly for a specific size (this is done in proof/sym_encode.c). This helper function never calls base64_encode_update() twice and is not used by b64enc.c, and that's why we don't know if the multiple-block implementation from b64enc.c actually behaves correctly.