Hacker News new | ask | show | jobs
by aero142 504 days ago
Are there any successful models that weren't trained with RLHF, or using a system with RLHF. I'm curious if this could be done without a fine tune step that would't meaningfully bias this.