Hacker News new | ask | show | jobs
by TheLoneWolfling 4316 days ago
> There are other serious bugs in java.util.Random which Sun, er, Oracle, has steadfastly refused to fix or revise for a decade now due to concerns about "backward compatibility" (in an RNG!)

Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).

That being said, things like this are why I wish more programming languages had proper (read: fine-grained) versioning of libraries. "I need to link to a function with this signature, that is one of these versions or any version that has been declared to be compatible with them, preferably the latest version that is so". That sort of thing.

> For example, Mersenne Twister is a pretty high quality, widely used RNG. But it is not cryptographically secure... But SecureRandom is really bad for those tasks too: it is slow. For example, my MersenneTwister implementation is about 9 times faster than SecureRandom

By your own words, you're comparing apples to oranges here...

2 comments

Also, when you're writing scientific code (Monte Carlo simulations, for example, or disordered systems), the folk wisdom is that you must to keep track of what seed you used. I've never had to use these records myself, but I can image wanting to go back and reproduce exactly the same calculation, for (e.g.) debugging or verifying new code. Now you think about resurrecting some previous grad student's code that only works when used in exactly the right way and has documentation scattered through comments and notebooks---which is perfectly natural, since he probably didn't think anybody else would ever use it---and a change in PRNG algorithm could be immensely frustrating. Or think about trying to re-run old code to compare with new analytical results: you'd want to verify that you're getting exactly the same results, as a way of checking that there isn't some other bitrot hidden away somewhere.

(Now, why one would be using Java for Monte Carlo simulations I haven't the foggiest idea.)

Couldn't you solve this by tracking the version of Java (for example) used initially to develop the simulation?
Yeah, and this is why I'm very careful about recording the versions I use for a calculation (version number and/or commit for the language, library version numbers, and commit for my code). The thing is, that's not your first step: the first step is to try and get the thing to run on whatever Java (/Julia/Matlab/Mathematica/Fortran) you have installed. If that doesn't work, there are a thousand things it could be (improper usage, some missing file you had no clue you might need, change in libraries, change in OS/distro/CPUarch, out-of-memory with a new set of parameters); installing an old version of your language and trying it on that is not necessarily easy, and certainly not the first bug-finding step.

Philip Guo (pgbovine) addressed many of these problems in his thesis work, on Burrito (http://www.pgbovine.net/burrito.html ) and CDE (http://www.pgbovine.net/cde.html ). Both of these look very cool, but I've never managed to get over the hump and actually start playing with them.

At this point I should note that I'm actually in favor of fixing a bad standard-library PRNG---I just think it's important to understand (a) what could break, and (b) why users might be unhappy about the breakage. This isn't quite a Chesterton's Fence sort of situation, but it's similar in spirit.

Sure, but it kind of sucks if your whole program has to remain on Java 1.3 forever because you need a particular RNG.
Until a required library (or Java itself) has a security flaw and the new version does not support that version of Java.
> Backward compatibility in a RNG is actually rather important, weirdly enough. For example, anything that uses deterministic seeds for repeatability of procedural generation or optimization. (Read: Minecraft, among other more important things).

This seems like a very, very bad thing to rely on. Other than Java, how many other languages have guarantees in generator determinism from version to version as part of the language contract? Certainly Lisp considers it an antipattern.

Any language that allows the PRNG to be manually seeded, presumably?

I mean, if manually seeding the PRNG doesn't produce a deterministic result, you wouldn't make it part of the API.

Nope.

Essentially all languages allow manual seeding: almost none specify guarantees with regard to RNG behavior. I'm pretty sure this is because determinism of a given process important (everyone who does simulation needs that), but locking into potentially bad RNGs for the language or library itself, in fact specifying them, is a really bad idea.

Counterexample: Python [1]

[1] http://stackoverflow.com/a/19179886/1814881

If it's specified, I don't mind people relying on it. That being said, Java is over-specified in many ways, this potentially being one of them.

I know Python does not have this guarantee.

Personally? There should really be a couple different RNGs in Java's standard library - all combinations that make sense of [insecure/secure, can set seed / can't set seed, thread-local/global] (local/global being without distinction if one cannot set the seed). The default being secure / can't set seed.