Hacker News new | ask | show | jobs
by quantadev 591 days ago
> I don't expect them to replace transformers until they make some hypothetic breakthrough

Yes, a breakthrough that does what Self-Attention is doing, rather than just scaling up.

1 comments

No, that's what you're missing from the beginning: the breakthrough of transformers was scalability. Now we have other models that are equally scalable and as such roughly equally performant (and that's not a surprise).

But the ship has sailed and nobody is gonna switch to something else than transformers if it's not significantly better, and as such the other approaches are going to stay behind because every marginal improvement come to transformers first (because that's what practically everyone is working on) and alternative models are playing catch-up.

This is a remarkable example of path dependence.

Interpreting this as “transformers are fundamentally superior” is the mistake I'm trying to help you correct.

The breakthrough of transformers was scalability. The next breakthrough of equivalent importance will be entirely different or it won't be.

Scalability wasn't an architectural breakthrough. It was merely a discovery..
How are these words even in contradiction to each other?
By intentionally lacking context.
You aren't even trying to pretend your sentence make sense, I see…
so Kindergarten bro.