OP is also the author of the popular dimensionality reduction algorithm UMAP.
I guess the pipeline was embedding documents with an LLM (or even plain old word2vec average over the abstract might do it), and then reducing that to 2 dimensions with a cosine similarity metric with the help of UMAP.
I have no idea about colors and local cluster naming though. Maybe that's handcrafted.
I guess the pipeline was embedding documents with an LLM (or even plain old word2vec average over the abstract might do it), and then reducing that to 2 dimensions with a cosine similarity metric with the help of UMAP.
I have no idea about colors and local cluster naming though. Maybe that's handcrafted.