Hacker News new | ask | show | jobs
by nynx 1538 days ago
This is super impressive. Transformers have consistently done better than almost anyone thought.

I still hold the opinion that we’re going to need to move to spiking neuron (SNN) models in the future to keep growing the networks. Spiking networks require lots of storage, but a lot, lot less compute. They also propagate additional information in the _timing_ of the spikes, not just the values. There are a lot of low-hanging fruit in SNNs and I think people are still trying to copy biological systems too much.

Unfortunately, the main issue with SNNs is that no one has figured out a way to train them as effectively as ANNs.

3 comments

The comments of every ML paper posted on this site are dominated by people either baselessly discounting the results as a party trick or illusion, or shoehorning in their conjecture about what approach the field is overlooking.

As someone just trying to learn more about the implications of new research, I find myself resorting to /r/machinelearning, or even twitter threads, to get timely and informed discussions. That's a shame, given what HN sets out to be.

As an ML engineer I found the comment insightful. I agree HN takes a critical approach to list ML but that’s largely because there’s been so much snake oil with it
I’m certainly not discounting the results and I don’t see anything wrong with suggesting what I think would generally be a good path to look at in the future.
It's not wrong per se, and I'm obviously in no place to police the discussion, but it's only tangentially related to the post and often clouds out what would be a more pointed deliberation over this research.

Maybe I'm expecting too much of HN, but I've seen these same two top level comments under myriad ML posts.

Sorry for the meta-discussion that's gotten us further away from this really remarkable paper.

I agree.

It's completely speculative. There is no evidence at all that Spiking NNs really work better is any circumstances.

Speaking as someone who has worked in the ML field, it feels to me like advocates for them are caught up in the biological plausibility argument. That's an interesting branch of research, but has very little to do with how AI should be implemented using transistors. In some ways the "neural networks" name has done a great disservice because people keep getting caught in the trap of comparing them to how the human brain works.

Spiking comes with persistence baked in, so anything done with them has an implicit sequence and temporal context. Like LSTM, it automatically means the architecture is going to handle some problems better than a naive perceptron.

Transformers have a sequence context, but it constructs its own context dependent notion of orderliness with attention.

Persistent or recurrent activation states can extend the context window past the current tokenizing limitations. Better still would be dynamic construction where new knowledge can be carefully grafted into a network without training, and updates over the recurrent states feeding back into modifying learned structures.

Spiking networks might provide a clear architecture to achieve some of those goals, but it's really just recurrence shuffled around different stages of processing.

> it's really just recurrence shuffled around different stages of processing

Interesting. I hadn't really thought about this. Although I wonder if there is a more direct way of achieving this.

Point taken, I do agree with you that it’s probably best to stay on topic in these kinds of posts.
As a community grows it attracts people who don't have the same background that drew the original members of the community together, so it becomes inevitable to see this kind of layman commentary. I've seen it happen to r/hardware which has been taken over by gamers with no CS background and AMD shareholders when it used to have a lot of knowledgable people commenting.
I don't claim to be an expert, but I actually do undergraduate neuromorphic computing research. So, I don't know much, but I do know a little about what I'm talking about.
Don't forget /r/mlscaling!
> Spiking networks require lots of storage, but a lot, lot less compute.

One way or another we need a 1000x increase in efficiency to be able to run these models on edge hardware with full privacy and outside the control of the big corporations.

Funny that Gary Marcus is pleading on Twitter to get Dall-E 2 access in order to formulate his response. He isn't getting access yet. https://twitter.com/GaryMarcus/status/1513215530366234625

That kind of gate-keeping is possible because the costs of training and inferencing these models is too high today.

What’s the current problem with control here? Outside of the loop layman here.
These transformer models are so huge, they require extremely expensive and specialist hardware beyond what enthusiasts and even many academica access to.

There is no chance in the near future consumers or Edge devices will be able to run these models locally, data is going to have to be fed back into the cloud.

Thanks for replying! I had no idea there were models this large. Feels a bit like going back to the mainframe age.
Smaller models with better performance are beginning to arrive. Things like RETRO, better training data, longer training time, and scale optimization will have these models on phones and desktops doing crazy things in the near future.
They are but performance is decreased. In many cases transformers are encoding vast amounts of training data within the insane number of parameters.
> a lot of storage

Is this fundamental, or just a problem with mapping these models to our current serially-bottlenecked compute architectures? Could a move to “hyperconverged infrastructure in-the-small” — striping DRAM or NVMe and tiny RISC cores together on a die, where each CPU gets its own storage (or, you might say, where each small cluster of storage cells has its own tiny CPU attached), such that one stick has millions of independent+concurrent [+slow+memory-constrained] processors — resolve these difficulties?

They require roughly the same amount of storage as modern ANN networks except that "neurons/synapses" may have some additional state that needs to be stored. Compared to the compute they require in relation to the compute needed for large-scale ANNs though, the storage is a lot.