| > No one here is a scientist and no one treats any of this as science. Where's the criteria for the emprical adequecy of NLP systems as models of language? Specifying any, conducting actual hypothesis tests, and establishing a theory of how NLP systems model language -- this would immediately reveal the smoke-and-mirros. What do you mean? I'm not a scientist but I play one sometimes, and I managed a whole team of them working in this field. The theory of language models is well established. > Where's the criteria for the emprical adequecy of NLP systems as models of language? There are lots(!?) I think the Winograd schema challenge[1] is an easy one to understand, and meets a lot of your objections because it is grounded in physical reality. Statement: The city councilmen refused the demonstrators a permit because they [feared/advocated] violence. Question: Does "they" refer to the councilmen or the demonstrators? The human baseline for this challenge is 92%[1]. PaLM (this Google language model) scored 90% (4% higher than the previous best)[3]. [1] https://en.wikipedia.org/wiki/Winograd_schema_challenge [2] http://ceur-ws.org/Vol-1353/paper_30.pdf [3] https://storage.googleapis.com/pathways-language-model/PaLM-... pg 12 |
A theory with empirical adequecy would require you to do some actual research into language use in humans; all of its features; how it works; various theories of its mechanisms etc. And after a comprehensive, experimental and detailed theoretical work -- show that NLP models even *any* of it.
Ie., that any NLP model is a model of language.
All you do above is design your own win condition, and say you've won. This precludes actually knowing anything about how language works, and is profoundly pseudoscientific. If you set-up tests for toys, and they pass -- good, you've made a nice toy.
You may only claim is models some target after actually doing some science.