Hacker News new | ask | show | jobs
by cautious_int 3960 days ago
I think you have a bug in codepoint_read(). The last argument is a pointer to variable unicode_t named decoded. If the first if statement in that function is true, then the a value is never assigned to that variable, the function returns, and immediately outside the function the value of that variable is read. But that variable was never initialized.

https://bitbucket.org/knight666/utf8rewind/src/c22e458912952...

https://bitbucket.org/knight666/utf8rewind/src/c22e458912952...

https://bitbucket.org/knight666/utf8rewind/src/c22e458912952...

The prior while statement makes sure the second parameter src_size is never 0, but the pointer src might be.

Undefined behavior can be caused by passing a NULL pointer as the first parameter and a non-negative second parameter to utf8toutf16().

1 comments

This is actually guarded against in a macro:

    #define UTF8_VALIDATE_PARAMETERS(_inputType, _outputType, _result) \
        if (input == 0) { \
            UTF8_SET_ERROR(INVALID_DATA); \
            return _result; \
        } \
If a NUL parameter is passed to utf8toutf16 and friends, they'll return UTF8_ERR_INVALID_DATA.

However, you are correct about codepoint_read's behavior. I actually have a test for passing NUL to the function, but the value of decoded isn't checked:

    TEST(CodepointRead, InvalidData)
    {
        const char* i = nullptr;
        size_t il = 0;
        unicode_t o;

        EXPECT_EQ(0, codepoint_read(i, il, &o));
    }
It's definitely a bug, but relatively harmless. The function is only used internally and the calling sites guard against that kind of attack.

Nevertheless, I thank you for the extra set of eyeballs. I have been looking at these functions far too long, you tend to gloss over obvious problems after a while.

I'll be sure to get a fix in for the next release. :)