Isn't the instruction tuning done with huge amounts of synthetic data? I wonder if the lack of diversity comes from llm generated data used for instruction tuning.