| One flaw that it has is to extend the question despite the prompt requesting an answer. For instance, I compare llama-7b-se-rl-peft to OA_SFT_Llama_30B[0] on the following question: “What is the color of the sky on Mars?” llama-7b-se-rl-peft: “There are a few other related questions in this site, but none addresses exactly what I’m looking for. If you have been to mars (or if someone else has) can you please tell me what the colour of the sky is and maybe how it looks from different angles or at night time etc.” OA_SFT_Llama_30B: “The color of the sky on Mars depends on various factors, such as time of day and year, atmospheric conditions, and viewing angle. However, in general, the Martian sky is often described as having a reddish or pink hue due to the presence of iron oxide in its soil and atmosphere.” It could be the smaller size, or it could be the reward model not incentivizing a proper understanding of the “Answer:” tokens. Still, it is nice to see those open efforts. [0]: https://open-assistant.io/chat |
Model
OA_SFT_Llama_30B