Hacker News new | ask | show | jobs
by ren_engineer 503 days ago
not sure why people are surprised, it's been known a long time that RLHF essentially lobotomizes LLMs by training them to give answers the base model wouldn't give. Deepseek is better because they didn't gimp their own model