| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by emporas 1485 days ago

Of course this should be expected. The models are trained on internet data of natural language, where people are making typos, use abbreviations, some are not native speakers of english, others are talking in greeklish, or arabenglishy or whatever.

The machine is always trying to associate the words with other words semantically close together. E.g. when taken as input strong_man, or strng_man or srong_man these are all mean the same because that combination of letters are usually used with the word man, and there is no other competitor word to replace the srong except strong.

Now why that should be considered a secret language, it is beyond me. The input language for the machine is a natural human language, and that means it is very poor defined language for the machine to recognize. That is going always to produce a lot of gibberish.