Hacker News new | ask | show | jobs
by Stormcaller 3977 days ago
Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

see: http://php.net/manual/en/function.mt-rand.php

Under caution:

The distribution of mt_rand() return values is biased towards even numbers on 64-bit builds of PHP when max is beyond 2^32. This is because if max is greater than the value returned by mt_getrandmax(), the output of the random number generator must be scaled up.

edit: this post went from 5 points to 1, which I don't care about(in ~500 days I posted less than 10 times and I have ~35 points), but who downvotes documentation, seriously? -_-

5 comments

While documented, that is surprising behavior. If it takes in an int, shouldn't it be able to take in PHP_INT_MAX? And shouldn't it yell at you instead of just silently going about its day?
The PHP approach seems to be that any crazy behavior is acceptable as long as it's documented.
To be fair, that is not an uncommon situation elsewhere either.
I think it is. Sure, there are plenty of calls out there in all sorts of systems that have crazy, documented behavior. But it's rare to see anyone outside of the PHP world defending the crazy behavior purely on the basis that it's documented. Crazy behavior is almost always there because of compatibility concerns, performance concerns, or an acknowledged bug that just hasn't been fixed yet. Outside of PHP, I rarely see anyone just say, "It's fine, that's how it's documented to behave" and leave it at that.
I recall a Rails bug used to hack github and inject a bugfix commit. The Rails community rejected the fix originally because the crazy behavior was documented.
Wow. Do you have any more info on this?
Outside of php and systemd.....

<Friday afternoon trolling is just too easy this week />

And mySQL of old.

I assume things are much better now, but I remember a time when things like INSERT <table> (<date_field>) VALUES ('2015-02-30') would not have raised any sort of error, amongst other terrible things that people would defend having to implement explicit checks for in other layers of your application.

I'm genuinely curious, what are some examples of systemd's odd behavior?
What's more common outside PHP is that functions throw errors or exceptions when given invalid input, rather that trucking along and producing output that is clearly not what was intended.
Maybe they should just make the random functions return '4', documenting that it was chosen by a fair dice roll and is therefore guaranteed to be random.
That's not a "crazy behavior", it's a limitation of the algorithm they use. mt_rand() only has approx. 2^32 possible seeds; why would you expect it to support ranges larger than 2^32?

The best thing to do is to not use mt_rand().

> why would you expect it to support ranges larger than 2^32?

Because this is not a simple, stupid linear congruential generator. mt_rand(), is, as I understand it, based on Mersenne Twister, a well-regarded generator which has a period of 2^19937. And if PHP had done this right, mt_rand() would be seedable with an array of some 624 integers.

Mersenne Twister has an alternative "classic" seeding algorithm, drawn from Knuth, to be compatible with old-style generators. This algorithm takes a 32-bit seed. Apparently that's all that PHP is supplying.

It's dead simple to do random(n) uniformly. When I heard that some language implemented a function called mt_rand() which was hilariously broken in this respect, the first thing that came to my mind was "that's gotta be PHP". And it was.

I would not expect it to support ranges larger than 2^32, so why does it?

Returning garbage when one of your parameters is beyond a certain limit is "crazy behavior." Either support it properly, or don't support it at all.

If we were talking about C this would be called undefined behavior. If you don't satisfy the preconditions before calling a function then the not only is the output invalid but the entire program which uses it is invalid too.

I don't think a language like PHP, or indeed anything written in it, should have undefined behavior but I'm not the author of the library.

"Fixing" broken data silently is tremendously bad behaviour in general because you can't possibly know whether the caller knew that they were providing invalid data and whether the caller is going to be happy with your fix.

If you expect input parameters in some range and you get values out of that range, then you blow up. You don't silently truncate anything and you certainly don't reduce the entropy of your random number generator by nearly 50%.

This was my main doubt about PHP 7's scalar types, which will autocast values into the desired types. I don't expect the caller to know better than the callee what are the method's boundaries, and once casted you may not be able to see if the input was garbage. But yeah, when something breaks the method's author can wash their hands.
If the user asked for more bits of randomness, they need to generate more bits. They could have chosen a different algorithm, or perhaps called the random number generator twice and concatenated the bits. Scaling up is clearly the wrong thing to do.
It also makes me wonder how it behaves for numbers below the limit. I would wager that there's substantial bias in the results if you ask for a max of, say, 2^31 + 2^30.
The crazy part is that, instead of throwing an error when given input that doesn't make sense, it just silently produces garbage output.
This. Documenting odd behavior like this isn't an excuse for having odd behavior. It should just be changed to fail on bad input. It's not even a breaking change (apart from code already using it with bad input).
How else is it going to unseat INTERCAL?
First it needs a replacement for INTERCAL's "please come from".
A clear example of "worse is better" winning.
If it's documented it's a feature.
Almost everything in PHP is surprising behavior.

And yes, in the eleventy billion cases where PHP should yell at you that it's doing something totally different than what you wanted, it instead just goes about its day silently.

Which is why it was a great fit for MySQL, which would silently store invalid dates and truncate over-long strings with no warnings.
You joke, but I'm pretty sure this is one of the reasons why that stack was so successful. Same principle behind HTML rendering. "It's displaying something, isn't it? Errors are only suggestions."
I'm not so sure. It's kind of true for HTML (forgiveness was not "good" property of the format, but of the browser: 1 site works in one browser, but not in the other, and user will blame the browser, so let's display whatever the hell we can). But PHP in the early days was kinda handy (even though hacky) tool, allowing to use simple syntax to render HTML instead of custom Perl-scripts. And then it just got popular.

MySQL was once actually superior open-source database, being more performant and simpler to setup and use than PostgreSQL. And then, once again, it just got popular.

I don't think being faulty helped these technologies, although it sometimes is the case.

There are warnings. Most people just ignore them in their APIs
Trying to insert over-large data should have been an error by default from the outset, never a warning. Ilooked at MySQL in 2000 and went "You had one job, database; protect my data."
This is so true. It is the main reason I switched to PostgreSQL and never looked back.
... and accept nonsensical SQL queries, returning whatever it feels like.
>Almost everything in PHP is surprising behavior.

It's not. White it does have several surprising behaviors, "almost everything" is just an example of uniformed BS people say when it comes to PHP.

C also has undefined behavior and several bad decisions, ditto for C++, Javascript, etc. But PHP is just an easy target for trolls.

Even worse, some of the Internals teams argue in favor of silently trucking along instead of, e.g. throwing an exception: http://www.serverphorums.com/read.php?7,1250372

EDIT: because of HN's annoying "you are posting too fast" limit I cannot reply to the comment below.

The arguments against Exceptions began in the comments on the relevant pull request: https://github.com/php/php-src/pull/1397

from that thread..

   At what point do we stop blaming the developers for not knowing how to
   use our badly designed features, and accept responsibility for exposing
   an API that is hostile towards simple, efficient, and correct implementations?
It might be worth noting that 'sarciszewski was the author of that remark.
Wow, that thread is current! This mentality really is systemic to PHP. It's like the exact opposite of "let it crash".
Actually that thread seems to strongly argue that either an Exception or fatal error should be thrown. I'm not seeing anyone in that thread wanting to silently truck along -- am I missing it?
That is not the PHP way.
There is a caution box in the description, which contains a different warning, and then the warning you have cited is in a second caution box in the "Notes" section, after the changelog and the examples.

It doesn't surprise me that people might not notice the existence of that second warning. I believe that most developers wouldn't scroll down to read the changelog and the example if they think they understood what the function does from its description.

Why even accept ranges that span more than 2^32? That seems like an easy solution to a broken function.

Also fun:

    echo "<?php echo mt_rand(-mt_getrandmax(), mt_getrandmax());" | php
is always even on my PHP 5.5.20.
> Why even accept ranges that span more than 2^32? That seems like an easy solution to a broken function.

It may never have been designed to. If it was written before 64-bit machines were commonplace, mt_getrandmax() would always be the same as PHP_INT_MAX.

Why not generate two values from the underlying function and shift them into place if the range is larger than 2^32 -1?

Oh well, ofc. because this is PHP which goes by the principle of most surprise.

Yes, that's exactly what sensible RNG interfaces do, like std::random in C++. If your underlying engine only produces 32 bits at a time, it'll grab two of them when you request a 64-bit type.
no surprised about the down votes, i've seen a lot of logical explanations and correct responses get them. its almost like they would rather read drama then thoughtful answers.
> Because max value should be "mt_getrandmax()" instead of "PHP_INT_MAX", it just gets a 32 bit number then scales it up.

How does that answer anything?