Aren't LLMs a classification problem in a sense? "Given this text, classify it based on the next token" seems like a viable interpretation of the problem. Although there are a couple thousand or so classes, which might be a lot (but I know very little about this field).
You can cluster data using unsupervised random forests and then use these cluster indices as features.