| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by redox99 5 days ago
	In the 90s you didn't have norm layers, residuals, attention, and some more. So you're missing a lot of the building blocks that make LLMs. It's not a matter of just having the compute.

1 comments

sirsinsalot 5 days ago

I think the attention mechanism is so simple but so revolutionary that people forget it.

Like the best leaps in thinking, once it is made, is is immediately obvious and intuitive.

link

bonoboTP 4 days ago

Yes, but it wasn't invented from nothing in 2017. Soft attention existed in other applications like information retrieval, Nonlocal networks had similar ideas as well. But it wasn't seen or used as a fundamental building block. But it wasn't something out of the blue either.

link

redox99 5 days ago

Almost everything in ML is like that. It seems so obvious in hindsight. It's maybe what I love most.

Residual connections are so simple, so obvious and so vital. Yet nobody came up with them until 2015?

link

sirsinsalot 5 days ago

I suspect it was considered many times, but the sheer computation scale would make it feel like obscene brute force. It feels like the right shape but too wild to think about implementing.

I think as time went on, and hardware got better, it seemed more reasonable to actually think about a viable implementation of what I think was a widespread intuition anyone in ML had that everything's context is everything.

It just seemed like a theoretical thing until hardware caught up. Maybe. Perhaps I'm applying a retrospective excuse to why it took so long.

link

redox99 5 days ago

People definitely wanted to train deep networks before, but didn't know how. They evdn tried things like training layers independently.

I don't think it was intuitive to anyone back then, the vanishing gradient problem was a big deal since the dawn of NNs. I'm not sure what you mean by sheer computation, residuals allow you to have deep networks instead of shallow and wide ones. You can have equivalent parameter count.

link