RLHF is supervised learning on top of unsupervised learning. Is supervised learning at some point of the process a requirement for all reasonable ML models?