|
|
|
|
|
by gwern
636 days ago
|
|
The original cited paper https://arxiv.org/abs/2210.08277 "Deep Differentiable Logic Gate Networks" struck me then as very clever, ultra-efficient/small, hardly any inductive bias or prior... But you have to wonder if it's able to scale reasonably well. Differentiating continuous versions of 16 discrete operations in parallel sounds expensive, especially since you presumably need a bunch of them chained in order to approximate a single neural primitive. Even distilling an existing LLM down might be too hard. |
|