Hacker News new | ask | show | jobs
by vjerancrnjak 840 days ago
There's no preprocessing being done. This is pure computation, from the tokens to the outputs.

I was quite amazed that during 2014-2016, what was being done with dependency parsers, part-of-speech taggers, named entity recognizers, with very sophisticated methods (graphical models, regret minimizing policy learners, etc.) became fully obsolete for natural language processing. There was this period of sprinkling some hidden-markov-model/conditional-random-field on top of neural networks but even that disappeared very quickly.

There's no language modeling. Pure gradient descent into language comprehension.

2 comments

I don’t think all of those tools have become obsolete. NER, for example, can be performed way more efficiently with spaCy than prompting a GPT-style model, and without hallucination.
There was this assumption that for high level tasks you’ll need all of the low level preprocessing and that’s not the case.

For example, machine translation attempts were morphing the parse trees , document summarization was pruning the grammar trees etc.

I don’t know what your high level task is, but if it’s just collecting names then I can see how a specialized system works well. Although, the underlying model for this can also be a NN, having something like HMM or CRF turned out to be unnecessary.

Oh, right. If the high-level task is to generate a translation or summary, I think that’s been swallowed up by the Bitter Lesson (though isn’t it an open question if decoder-only models are the best fit? I’d like to see a T5 with the scale and pretraining that newer models have had).

On the other hand, people seem to be using GPT-4 for simple text classification and entity extraction tasks that even a small BERT could do well at a fraction of the cost.

I agree it's neat on a technical level. However, as I'm sure the people making these models are well-aware, this is a pretty significant design limitation for matters where correctness is not a matter of opinion. Do you foresee the pendulum swinging back in the other direction once again to address correctness issues?
There is a very long-running joke in AI, going back to 1970s (or maybe even earlier?) that goes something like, "quality of results is inversely proportional to the number of linguists working on the project".

It seems that every time we try it, we find out that when model picks up the language structure on its own, it ends up being better at it than if we try to use our own understanding of language as a basis. Which does seem to imply that our own understanding is still rather limited and is not a very accurate model.

On the other hand, the fact that models get amazing translation capabilities just from training on different languages (seriously, if you are doing any kind of automated translation, do yourself a favor and try GPT-4) implies that there is a "there" there and the Universal Grammar people are probably correct. We just haven't figured out the specifics. Perhaps we will by doing "brain surgery" on those models, eventually.

The "other direction" was abandoned because it doesn't work well. Grammar isn't how language works, it's just useful fiction. There's plenty of language modelling in the weights of the trained model and that's much more robust than anything humans could cook up.
> Me: Be developer reading software documentation.

> itdoesntwork.jpg

Grammar isn't how language works, it's just useful fiction.