|
|
|
|
|
by baalimago
1596 days ago
|
|
Many words have several semantic definitions depending on definition. This is why the word "is" is a very good token to have in a vocabulary (as an example), since it can mean so much depending on what tokens came before and after it. Numbers have very limited semantic value. "123816" only means that number, and it's used very rarely in comparison to basically any other word (and the higher the number, the less chance of use, statistically peaking). So the question becomes; to what extent do you expand the vocabulary using only numbers? "1", "2", "3", ... "1000000" would probably be a huge waste of words in an AI vocabulary (1MB input nodes), yet still not very impressive arithmetically even with 100% calculationrate. In comparison, a hand calculator from 30 years ago could do this with ease. It's not a question of being able to cleverly tokenize. Calculations like this is an inherent flaw of vocabulary based AI until the semantic meaning of number sequences are somehow taught to it. Basically it needs to understand that "12" and "1" + "2" has the same contextular meaning, something which very rarely is explained in anything but 7 year old's schoolbooks. The problem is the dataset. |
|