|
|
|
|
|
by tardedmeme
45 days ago
|
|
Their native content is semantic vectors. They had to be trained for a long time to convert between text and semantic vectors, and the conversion is very lossy. Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it. |
|
I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?