| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mootothemax 106 days ago

> We had good small language models for decades. (E.g. BERT)

BERT isn’t a SLM, and the original was released in 2018.

The whole new era kicked off with Attention Is All You Need; we haven’t reached even a single decade of work on it.

1 comments

otabdeveloper4 106 days ago

> BERT isn’t a SLM

Huh? BERT is literally a language model that's small and uses attention.

And we had good language models before BERT too.

They were a royal bitch to train properly, though. Nowadays you can get the same with just 30 minutes of prompt engineering.

link

mootothemax 106 days ago

> > BERT isn’t a SLM Huh? BERT is literally a language model that's small and uses attention.

Astute readers will note what’s been missed here.

Fascinating, really. Your confidently-statement yet factually void comments I’d have previously put down to one of the classic programmer mindsets. Nowadays though - where do I see that kind of thing most often? Curious.

link

ricericerice 106 days ago

After some research, I think I understand what you're getting at here - BERT being a model for encoding text but not architecturally feasible to generate text with it, which "LLMs" (the lack of definition here is resulting in you two talking past eachother), maybe more accurately referred to as GPTs, can do.

Also the irony of your comment when it in itself was confidently stated yet void of any content was not missed either - consider dropping the superiority complex next time.

link

joefourier 106 days ago

You can actually generate surprisingly coherent text with minimal finetuning of BERT, by reinterpreting it as a diffusion model: https://nathan.rs/posts/roberta-diffusion/

I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.

link

mootothemax 105 days ago

For context, BERT is encoder-only, vs SLMs and LLMs which are decoder-only, and BERT is very much not about generating text, it’s a completely different tech and purpose behind it. I believe some multimodal variants nowadays may muddy the waters slightly, but fundamentally they’re very different things, let alone around been around for decades unless also including the history of computing in general.

While I could’ve written that better and with less attitude, gotta confess - and thx for pointing out my smugness - the AI stuff of the last few weeks really got under my skin, think I’m feeling all rather fatigued about it

link

otabdeveloper4 106 days ago

BERT is one example of a language model that solved specific language tasks very well and that existed before LLM's.

We had very good language models for decades. The problem was they needed to be trained, which LLM's mostly don't. You can solve a language model problem now with just some system prompt manipulation.

(And honestly typing in system prompts by hand feels like a task that should definitely be automated. I'm waiting for "soft prompting" be become a thing so we can come full circle and just feed the LLM with an example set.)

link

krisoft 106 days ago

> Astute readers will note what’s been missed here.

I’m not astute enough to see what was missed here. Could you explain?

link

Otterly99 105 days ago

If I'm not mistaken, BERT is a classifier (enters text, outputs labels) so it is not a "Language model", as it cannot be used for text generation.

link

krisoft 104 days ago

The abstract of the original BERT paper starts with these words: "We introduce a new language representation model called BERT, [...]" The paper itself contains the phrase "language model" 24 times.

It might not be considered a language model today, but it was certainly considered one when it was originally published. Or so it would seem to me. Maybe there is a semantic shift which happened here?

link