| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Dorialexander 942 days ago

Not feasible to go with pretraining only.

What is possible is to use a larger learning rate but this will be a hard trade-off with conversational capacities. Fine tuning is currently based on original texts with a synthetic prompt. The issues that people have noticed (repetitions, not remembering what was in the prompt) will be more significant if the learning rate is higher.

Maybe a solution will be to provide two different variant of the same model, one less immersive and more workable, and the other more immersive and buggy.