This isn't a new result really. We already know through the gpt-4 paper that rlhf style fine-tuning just makes the model more compliant, not more capable.
Exactly. Few hundreds of thousands of interactions with chatgpt is definitely too less for model to learn lot of new things. The thing it does well is make them much better at following instructions. It also makes it much better at working with given context.