|
|
|
|
|
by danielmarkbruce
523 days ago
|
|
The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high. |
|