|
|
|
|
|
by bob1029
271 days ago
|
|
> You simply cannot compute the gates for the entire sequence in one shot, because each step requires the output from the one before it. This forces a sequential loop, which is notoriously inefficient on parallel hardware like GPUs. > The crux of the paper is to remove this direct dependency. The simplified models, minGRU and minLSTM, redefine the gates to depend only on the current input The entire hypothesis of my machine learning experiments has been that we should embrace the time domain and causal dependencies. I really think biology got these elements correct. Now, the question remains - Which kind of computer system is most ideal to run a very branchy and recursive workload? Constantly adapting our experiments to satisfy the constraints of a single kind of compute vendor is probably not healthy for science over the long term. |
|
It ended up optimising in a way that wasn't obvious at first, but turned out to be the noise of one part interacting with another.
Aha: Here's the paper https://osmarks.net/assets/misc/evolved-circuit.pdf
And a fluff article https://www.damninteresting.com/on-the-origin-of-circuits
And as per usual, Google was hopeless in finding the article from a rough description. No chance, at all. Chatgpt thought for 10s and delivered the correct result, first time.