No, and this is covered in the various Phi papers, as well as TinyStories [0]:
> Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training.
> Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training.
[0] https://arxiv.org/abs/2305.07759