Show HN: Beamsplitter – a new possibly universal hash

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Beamsplitter – a new possibly universal hash (github.com)
	69 points by ohvirginia 2238 days ago

10 comments

owenmarshall 2238 days ago

> The default S-box

> This was obtained from random.org by requesting 8,192 random bytes, as were all S-boxes tested so far.

https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number

link

barbegal 2238 days ago

It would have been far better to select numbers generated by the NIST Randomness Beacon https://beacon.nist.gov/home

And whilst you can sort of selectively choose which values to take from the beacon, it should reduce the ability to add a backdoor.

link

owenmarshall 2238 days ago

There are plenty of digits in pi.

If the hash is secure independent of s-box selection, I'd much rather bet on pi being normal than "the NIST beacon values aren't generated by AES in CTR mode" ;-)

link

knodi123 2238 days ago

> There are plenty of digits in pi.

Yes, but

"These fears can be allayed by using numbers created in a way that leaves little room for adjustment. An example would be the use of initial digits from the number π as the constants. Using digits of π millions of places after the decimal point would not be considered trustworthy because the algorithm designer might have selected that starting point because it created a secret weakness the designer could later exploit."

link

skykooler 2238 days ago

So why not just use the first 8192 bytes of Pi?

link

fdupress 2238 days ago

Because they are known in advance and you could design to exploit their structure.

link

ohvirginia 2237 days ago

that's a good idea. if you want to post some code turning some historical nist randomness beacon data into 1024 64-bit integers, then test it and run it against smasher using the included utilities script (to run tests in parallel) I'm happy to include the results in the readme.

link

DagAgren 2238 days ago

Yes, this is definitely not great.

link

jcims 2238 days ago

It should probably just default the s-box to all zeroes or some other method of deriving it ex-nihilo, but a default is practical just so you don't need to synchronize s-boxes between uses.

I would imagine anyone interested in using this for serious business™ would start with a new s-box hierarchy.

link

owenmarshall 2238 days ago

An all zero s-box would give some, how shall we say, useful toeholds to an attacker.

That's not how S-boxes work, and crypto is best done with sane defaults.

link

remcob 2238 days ago

I'm confused. Supercop is a benchmark for cryptographic hash functions, but SMHasher is a test for non-cryptographic hash functions. The use cases list cryptography, but also universal hash functions which are generally not crypto-grade. It compares itself to the SHA hashes, but only has 64 bit output.

Is Beamsplitter supposed to be cryptography grade or not?

link

hackcasual 2237 days ago

It's not, at least not now it seems. It's just seeing if you can use an S-box design to create a "universal" hash.

link

rurban 2236 days ago

No. It's a quality hash, like crypto hashes, but of those one of the slowest ones. Extremely slow, like Siphash. But it's useful for Javascript, I guess.

link

api 2238 days ago

I should warn any reader not to use this or any other novel cryptographic algorithm in production. Don't use anything crypto in production until it has been very heavily analyzed for years by professional cryptographers.

link

underdeserver 2238 days ago

This.

If you want me to use your hash function, show me 2-3 independent analyses from independent researchers.

link

lalaithion 2238 days ago

This hash is only 64 bits, so if you have more than ~10^9 or ~2^30 elements being hashed, collisions will become an issue.

Granted, most people don't have this problem, but it's not nobody.

link

cordite 2238 days ago

The source is using `.cpp`, though it does not appear to be using any C++ features.

Would it be reasonable to move to `.c` so that it can be integrated in all sorts of things?

Aside, when something is Apache licensed, and someone wants to make, say an Erlang NIF with something, what effects does that embedding have on the NIF library and users of the NIF library?

link

ohvirginia 2237 days ago

what would be a better license to use to encourage people to use it?

also good point about CPP I will change that.

link

cordite 2237 days ago

MIT and BSD licenses are very embeddable, but I am not familiar enough with Apache licensing when it comes to embedding. This is why I ask.

link

ohvirginia 2237 days ago

Switched to MIT now

link

randtrain34 2238 days ago

can someone ELI5 why this is useful over existing hash functions?

link

rclayton 2238 days ago

This is how I feel when people start talking about cryptography. Definitely feel my university underprepared me on this topic. :(

link

google234123 2238 days ago

That's the wrong attitude. Universities are a place where you should be much of the learning yourself. There is not enough time in a class for a lecturer to recite every word or idea that is present in a large textbook but there is definitely enough time outside of class to read it.

link

zdragnar 2238 days ago

> Universities are a place where you should be much of the learning yourself.

The professors in the first three years of my schooling definitely did everything wrong, then. Passing and failing classes had next to nothing to do with independent learning.

link

google234123 2236 days ago

Passing classes is just a small part of what it means to attend a university.

link

rclayton 2237 days ago

That’s dumb. I’m not asking for a professor to read to me. I’m asking for the school to provide a curriculum that introduces me to these subjects since I PAID them to educate me. Cryptography was not introduced/required for my CS degree.

link

google234123 2236 days ago

My point was that you need to do reading and research on your own in order to get the most out of a an education there is not enough time for a professor to go over everything. I'm not sure why you are surprised you didn't get much out of it if you only did the minimum. You have the opportunity in university to do and learn almost anything you want./

link

rclayton 2236 days ago

You are making assumptions about people that you shouldn’t - that’s my point. I was working full time as a Software Engineer and was a new father while I was going to school. My ability to learn outside of the school’s curriculum was limited. The school did not emphasize this area of CS. I don’t expect my professor to hold my hand, but I do expect the school to establish a challenging curriculum.

link

seibelj 2238 days ago

god forbid existing hash function gets broken suddenly, another unbroken hash function with similar properties would be welcome

link

_Microft 2238 days ago

I see that you ran out of particle names for your projects. May I introduce you to super-symmetry, then?

What's the issue with picking names that do not exist already? It has got the upside that millions of webpages will not appear in the results when people are searching for your project's name.

link

ohvirginia 2237 days ago

Is this a reference to something?

I sort of get the feeling you're using voice, but you're the one speaking.

I don't get the italics section.

Also are you suggesting I pick another name? It sort of seems like you're replying to a comment, but this comment appears at top level.

If you can explain more I'll appreciate it. Thanks

Also, please suggest a name if that's your thing. I'm thinking 'metahadron' goes with the voice section.

link

_Microft 2237 days ago

No, there are no references in there.

The italis section is just a thing I add to comments sometimes: a few sentences that loosely relate to the thing I am talking about. It can be a quote, an imagined dialog, a flippant comment, ... other examples are at [0][1].

I was annoyed.

I'm physicist and computer projects have the annoying custom of picking names from physics, engineering or what else. Other people also come up with their own names, why should not computer tech people also do this?

Atom editor, Electron framework, Neutrino.js, Crankshaft, ...

[0] https://news.ycombinator.com/item?id=23103049

[1] https://news.ycombinator.com/item?id=23052357

link

ohvirginia 2236 days ago

Thanks for the explanation. I think the italis is cool and fresh.

hmmm, really interesting how you feel about the names. It sounds like that is super annoying.

I never thought about how naming would affect people invested in the names like this.

I don't think I need to defend it, so I'm not trying to here, just sharing that for me, beamsplitter sounds like such a cool word, as if a beam were a physical thing like a rock that could be split. Also something solid in itself, and connotes advanced, possibly war, tech. lasers. I'll going for that connotation. hash functions are usually very pathetically named.

also there's more to this name in this project because my initial design imagined the "beam" of the input, ricocheting around a network of s-boxes getting mixed. It seemed to me like the perfect hash, aesthetically and efficiently, and universal. but to my disappointment, I couldn't get a pure, s-box only design to work. I had to include some "traditional mixing function hacks" like multiplication, rotation and xor. But I wanted to keep the name because it was aspirational.

I can imagine that it must feel like all these annoying computer software people taking all these names that are not from their area, but from your area, and not leaving anything good for the rest. And when they have such high profile already! Like nobody will listen to the poor physicists, especially once all their names are taken, and then it will be more lonely. A nameless space, with nothing left. Sounds pretty sad.

Funny is for me, it seems physics stands above software, so using such names is a way to increase perceived value. But from your view, software has the higher profile.

Thanks for sharing.

link

_Microft 2236 days ago

Let me apologize for my complaints. It didn't occur to me that these names might be used out of admiration for a field.

I've also have to admit that physics needs relatively few new names in general which would make picking one a lot easier. There are also naming patterns as well, e.g. for superpartners (new articles in supersymmetry are either prefixed with S- or suffixed with -ion [0] in a predictable way).

[0] https://en.wikipedia.org/wiki/Superpartner

link

ohvirginia 2233 days ago

That was cool to read your reply. actually looking back over my code, I see my achievement was better than I thought. I only used addition, and rotation. no mult, nor xor.

technically tho rotation can be thought of as including multiplication and xor. but also not. so I don't know.

link

snypher 2238 days ago

I wish it had a more friendly name. I can't help but think of Room 641A.

https://en.wikipedia.org/wiki/Room_641A

link

asimpletune 2238 days ago

Looks interesting! What is meant by a “universal family”?

link

schr0dinger 2238 days ago

A universal set of hash functions is a set of hash functions such that randomly choosing any hash function from the set guarantees an upper bound on the number of collisions regardless of which keys from the universe are input to it (which are also random).

Basically it makes it more difficult for an adversary to exploit collisions from your hash function.

link

seph-reed 2237 days ago

I don't get how this could be used. I tried to imagine, and ended up with something wrong. This is what I imagined:

You have a list of hash functions, and choose one at random, then hash a password. Later a hacker gets these hashed passwords, and has an extra hard time? But this wouldn't work for checking passwords because you wouldn't know what hash.

What is a real use case?

link

aidenn0 2237 days ago

I may be wrong, but after doing a bit of research, here's one example:

Alice is storing keys in a hash table. Since this is a hash-table, the hash (H) that Alice will choose must be fast. However, real-world hash-tables will use a relatively small number of bits from the output of H, because even if you have a table sized to 4 billion, that's only 32 bits.

Let's say that Alice does this by taking the lowest N bits of the output of H (this works in practice regardless of which bits Alice uses) where 2^N is the size of the table. N may change as elements are added

Eve wants to mess with Alice by sending a bunch of keys that all have the same bottom M bits, where M is the largest expected value for N. Since the hash H is very fast, this is very computationally cheap to brute-force, particularly if you have access to very parallel hardware like a GPU.

Now consider that instead of using hash H, Alice uses hash-family U. Whenever a hash table is created (or rehashed,) Alice selects a random hash from U. Eve can no longer easily generate keys that will collide in the hash table.

From what I can tell, for password hashing, this is not appreciably better than salting, if the size of the set of possible salts and the size of the set U are the same.

link

schr0dinger 2237 days ago

This allows hash tables to have expected insertion and search times of O(1) as well which is as far as I’m aware the major motivation for it. Also the set of functions can be huge and the hash functions in the universe is infinite following the fact that there are infinitely many primes which would mean that brute force quickly becomes impractical for cracking the hash.

link

schr0dinger 2238 days ago

https://en.m.wikipedia.org/wiki/Universal_hashing

link

mfbx9da4 2238 days ago

What's an S-box (substitution-box)?

link

underdeserver 2238 days ago

An arbitrary function of X bits to Y bits. You generally want to pick these to be as non-linear as possible.

link