Hacker News new | ask | show | jobs
by danielmarkbruce 523 days ago
The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.