Hacker News new | ask | show | jobs
by tardedmeme 45 days ago
Their native content is semantic vectors. They had to be trained for a long time to convert between text and semantic vectors, and the conversion is very lossy. Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.
1 comments

> Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.

I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?

Because it's just the embedding for a seahorse plus the embedding for an emoji symbol output.