Some observations on a few ways different people actually gather feedback from humans in practice to improve LLMs. Sure I've missed some here, so let me know.
I think tying the feedback to a task is the way to go. I don't know exactly what they do with it but my son is a fan of Craiyon and Craiyon shows you a few images it generated and encourages you to favorite the best ones. I'm sure you could RLHF an image generator too.
I'd be glad to chat more (see my profile) but I think as much as people think there is a scalability advantage to a big co training one big model the problems of pleasing everybody, particularly advertisers, are terrible but a personal model that pleases one person might be easy with recent tech.