| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tardedmeme 45 days ago
	Their native content is semantic vectors. They had to be trained for a long time to convert between text and semantic vectors, and the conversion is very lossy. Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.

1 comments

Alifatisk 45 days ago

> Seahorse emoji demonstrates this nicely, the LLM internally holds a semantic vector for seahorse+emoji but the output translation layer can't match it.

I am curious about this, how can the LLM hold the embedding for seahorse+emoji if it doesn’t exist? How did it end up like this? Perhaps the dataset had discussions from people about new potential emojis?

link

tardedmeme 45 days ago

Because it's just the embedding for a seahorse plus the embedding for an emoji symbol output.

link