Hacker News new | ask | show | jobs
by theaussiestew 896 days ago
I would want to have multi lingual embeddings so I can learn languages more efficiently. Being able to see clouds of different words in different languages would allow me to contextualize them more easily. Same for phrases and sentences.
1 comments

I’m very much with you. Since most languages I know of are written with combinations of ordered glyphs, they all get rendered the same (although I don’t handle about 20k current Unicode characters of the full 400k+, and none of the grouping really works, so RTL languages would be a mess for individual words).

However, this is exactly where I want to go. A dictionary is a cyclic graph of words mapping to words. That means there’s at least one finite way to visit every single node and give it a position, with a direct relationship to the words that define it, and those words that define them, and so on.

This creates an arbitrary and unique geometric structure per language, and if you get fancy and create modifiers for an individual’s vocabulary, you can even create transforms for a “base” dictionary, and the way someone chooses to use certain words differently. You would be able to see, but likely not understand, the “structure” of types of text - poetry, storytelling, instructional writing, etc.