|
|
|
|
|
by minimaxir
777 days ago
|
|
A modern recommendation for UMAP is Parametric UMAP (https://umap-learn.readthedocs.io/en/latest/parametric_umap....), which instead trains a small Keras MLP to perform the dimensionality reduction down to 2D by minimizing the UMAP loss. The advantage is that this model is small and can be saved and reused to predict on unknown new data (a traditionally trained UMAP model is large), and training is theoetically much faster because GPUs are GPUs. The downside is that the implementation in the Python UMAP package isn't great and creates/pushes the whole expanded node/edge dataset to the GPU, which means you can only train it on about 100k embeddings before going OOM. The UMAP -> HDBSCAN -> AI cluster labeling pipeline that's all unsupervised is so useful that I'm tempted to figure out a more scalable implementation of Parametric UMAP. |
|