This is really useful thanks for sharing. My students and myself tend to waste a lot of time annotating clusters and have not found a reasonable solution yet. This will be fun to try.
I have written a neural network architecture (way smaller than llama) that can be trained to automate this process. Check out the Custom-Data-Tutorial in the repo!
Could this also be adapted for gene set enrichment? For example, if I had a set(s) of genes from an ATAC-seq experiment would it be able to guess their function / cell types?
GitHub: https://github.com/wwu-mmll/gatenet Paper: https://www.sciencedirect.com/science/article/pii/S001048252...