Well, that's the "magic" of modern deep learning. You can fit models with p > n somehow without overfitting. In some areas you might find this called "the strong inductive bias of neural networks" or "double descent" but no one has found a convincing explanation (to me).
It's quite amusing. The standard statistical theory does not work at all in estimating data vs model size, and the bounds are all vacuously large. It's a very active area of research, understanding why models act so simple when overparameterized and coming up with real measures of model complexity. Lots to read there if you are interested in such things.