Hacker News new | ask | show | jobs
by adontz 2051 days ago
I absolutely do not understand purpose of the folowing

https://live.european-language-grid.eu/catalogue/#/resource/...

https://live.european-language-grid.eu/catalogue/#/resource/...

Since when strict eBNF grammars are useful for natural language processing?

2 comments

There's a research direction on 'controlled natural language' which is essentially a limited subset of a natural lanugage that still allows to express all that you need for a particular problem domain.

They have their uses in natural language generation, where you may want to output some data in a way that's more readable to humans, and in various specialized query languages. For example, in some tasks you may prefer a voice command system that has more flexibility than mere keywords, where the instructions are phrases matching the users' language (you might want the same product to support many languages) but the system needs to understand only a very limited subset of language that can be expressed with a strict grammar - mainly because its ability to do stuff is also limited to what that subset can express. And this provides reliability - you can verify that the limited set of expressions that the system can understand get understood properly and those who aren't clear get rejected. This is bad for some use cases and good for others; picking a 'best effort' most likely interpretation (which many state of art methods do now) might be desirable or completely unacceptable depending on your use case.

The benefit of a strict grammar over (for example) NN transformer architectures for NLG and NLU is that it's relatively straightforward to map the structures of that grammar to the structured data that your non-NLP code is using for the business logic, you can have a clear and debuggable 1-to-1 mapping for the semantics of these phrases.

"Finance English" may not be a natural language
Every language has something like that. In Germany it's called "Behördendeutsch" (administration german). Even as a native speaker with good language skills, you have to read all forms and letters at least twice to make sense of it.