Hacker News new | ask | show | jobs
by gwern 636 days ago
The original cited paper https://arxiv.org/abs/2210.08277 "Deep Differentiable Logic Gate Networks" struck me then as very clever, ultra-efficient/small, hardly any inductive bias or prior... But you have to wonder if it's able to scale reasonably well. Differentiating continuous versions of 16 discrete operations in parallel sounds expensive, especially since you presumably need a bunch of them chained in order to approximate a single neural primitive. Even distilling an existing LLM down might be too hard.
1 comments

It's definitely slept on. I do think it ought to be very powerful given enough compute to throw at it, hopefully. I think short description length algorithms such as simple compression algorithms or instant-ngp could be interesting to play with through that paradigm.