|
|
|
|
|
by tgv
1054 days ago
|
|
THAT's the reason? Conveying a sentence as a series of propositions or a tree with case labels has been tried in the previous century, without success. It does not offer a good basis for translation, as e.g. Philips' Rosetta project showed. It works for simple cases, but as soon as the text becomes more complex, it runs into all the horrible little details that make up language. A simple example: in Spanish you don't say "I like X" but "X pleases me". In Dutch you say, "I find X tasty" or "X is good" or something else entirely, depending on what X is. Those are three fairly close languages. How can you encode that simple sentence in such a way that it translates properly for all languages, now and in the future? Symbolic representation isn't going to cut it outside a very narrow subset of language. It might work for highly technical, unambiguous, simple content, but not in general. Whatever you think of ChatGPT, it shows that a neural network can't be beaten for linguistic representation. |
|
I mean, the goal is wikipedia lite basically - so they are targeting technical unambigious simple content.
My understanding is the goal to target small languages where it is unlikely anyone is ever going to put in the effort (or have a big enough corpus) to do the statistical translation methods. Sort of a - this will be better than nothing approach.