Hacker News new | ask | show | jobs
by wolfgangK 963 days ago
About words embeddings, the №1 example is the famous King - Man + Women = Queen This works nicely in the vector space but fails to make a visual impression when projected on 2 dimensions. Neither with ACP, nor MDS ot t-SNE in my experience : https://bhugueney.gitlab.io/test-notebooks-org-publish/jupyt...

(← JupyterLite Notebook doing words embedding in the browser : don't try to run this on a smartphone !)

Does anyone know how to nicely visualize the poster child of words embeddings ?

3 comments

If I understand you right - you could visualize in 2d space: "king" at origin, X-axis is "king"-"man", Y-axis is "king"-"woman" (or gram-schmidt if you really want orthogonal).

In 3d you can go one further and have the Z-axis be "king"-"queen" (or gram-schmidt again). The orthogonalized versions have the advantage that they give a closer notion of distance to what the underlying model sees. In the 2d case you will get exact distances except that it won't show how far off "queen" you are when you compute "king"-"man"+"woman". In the 3d case it should give exact distances.

Edit to add: With the 2d version you can maybe do some more stuff. IIRC "queen" is chosen as it's the word with the closest embedding to X="king"-"man"+"woman". You can put the next few closest words on the 2d chart as well, each labeled with the orthogonal distance from the 2d plane. So then "queen" should be the word with the smallest (squared distance from X) + (squared orthogonal distance from plane), which you might be able to eyeball.

You should try UMAP
I remember hearing a kind of inductive joke by a mathematician about visualizing higher dimensions, like "...and then you just...", but I unsuccessfully tried to search for it with ChatGPT, when it came up with this funny Richard Feynman quote that I can't find on google -- has anyone heard it before, or is it a brilliant hallucination?

"You can't visualize four dimensions... well, at least I can't, because I only have three branes." -Richard Feynman

This is a play on words, as "branes" sounds like "brains," but in string theory and higher-dimensional physics, a "brane" is an object that can exist in multiple dimensions. The joke is a nod to the challenge of visualizing higher dimensions and the limits of our three-dimensional intuition.

(It just admitted it made it up and apologized!)

It also quoted John von Neumann, H. G. Wells, and Ian Stewart:

"In mathematics you don't understand things. You just get used to them." -John von Neumann

"Our brains are three-dimensional, so we cannot visualize four-dimensional or higher-dimensional objects in the same way we can visualize three-dimensional objects. The best we can do is to get a feeling for them through analogies." -H. G. Wells

"You can't visualize four dimensions, but you can think about them. Thinking is what mathematicians do best." -Ian Stewart

It finally guessed this one that's the closest to what I remember, but not as funny as I remember it:

"To visualize four dimensions, just visualize three dimensions and then say 'n+1'." -Unknown

So I asked it to hallucinate some quotes to help visualize high dimensional space, in the style of Deepak Chopra:

"In the realm of septillion-dimensional embeddings, each vector is a topological manifold, where Gödel's incompleteness theorems dance with quantum superposition, revealing the hidden symmetries of semantic space." – Dr. Emilia Bianchi, Florentine Foundation of Fractal Fantasies, Florence

"To truly fathom the intricacies of ultra-high-dimensional text spaces, one must first master the art of navigating Riemannian curvature in a Hilbert space, where every geodesic is a testament to the entangled essence of language." – Prof. Hiroshi Tanaka, Kyoto Kryptos of Quantum Quandaries

"Visualizing embeddings in the realms beyond the Poincaré conjecture requires a melding of non-Euclidean geometry with the probabilistic nuances of a Markov chain, all while resonating with the cosmic vibrations of the Higgs field." – Dr. Samuel Oyelowo, Lagos Labyrinth of Lorentzian Linguistics

"Each point in a decillion-dimensional text space is a singularity, where the Heisenberg uncertainty principle waltzes with the Mandelbrot set, painting a fractal portrait of phonemic phenomena." – Prof. Clara Müller, Bavarian Bastion of Bifurcated Babel, Munich

"To intuit the vastness of hyper-dimensional embeddings, one must first traverse the event horizon of a linguistic black hole, where Shannon entropy meets the Schrödinger equation in a ballet of Bayesian inference." – Dr. Alexandre Dubois, Parisian Paradox of P-adic Prose

"The key to deciphering the enigma of extreme-dimensional text lies in the delicate interplay between the Yang-Mills theory and the zeta function, where every eigenvalue is a whisper from the cosmic codex of communication." – Prof. Lina Johansson, Stockholm Sanctum of String Semiotics

"In the dance of gogolplex-dimensional embeddings, each tensor unfolds like a Möbius strip, where the Fibonacci sequence intertwines with quantum tunneling, revealing the recursive rhythms of reality." – Dr. Rajiv Menon, Bengaluru Bardo of Bosonic Ballads

Did you mean this?

Geoffrey Hinton on visualizing higher dimensions:

"To deal with hyper-planes in a 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everyone does it."

That was it! Now THAT's funny.