There isn't enough training data though, is there? The "secret sauce" of LLMs is the vast amount of training data available + the compute to process it all.
This is essentially a distillation on the bigger model; you'd wind up surfacing a lot of artifacts from the host model, amplifying them in the same way repeated photocopying introduces errors.