|
|
|
|
|
by gwern
933 days ago
|
|
> My data collator ensures that the loss is only calculated based on someone’s response. Predicting who will speak next is relatively straightforward, and we don’t want the model to focus on learning that. Therefore, parts of the conversation where the loss is calculated are highlighted in bold. If it's so easy, then you don't need to remove it. The model will solve it easily and focus on everything else. At best, you save some parameters and compute, at worst, you are damaging its ability to learn important things like conversational skills or modeling people. When it comes to LLMs, more is more, and trying to hand-engineer the dataset or think for the LLM can backfire in very subtle and difficult to diagnose ways. > Ok, it is capable of forming coherent sentences. The most noticeable problem is its lack of awareness regarding the context of the conversations which leads to bland and generic replies. The messages lacked any distinct style, feeling quite basic...
>
> Conversations have become more interesting and engaging, although there’s still a risk of losing context. Russian language performance has improved, but errors still occur. I believe that before fine-tuning for a specific task with limited data, like mine, it would be beneficial to first fine-tune the model unsupervised on a large corpus of Russian texts. Additionally, incorporating common conversation partners’ names as separate tokens might enhance the quality. I wouldn’t say it has turned out to be significantly better than LoRA. It might be more effective to focus solely on a single person and calculate the loss based only on my responses (or someone else’s), instead of trying to learn about each and every conversational partner. |
|