Why not convert the training code to AST?
Also, if it is only trained on code, it's likely to miss out on all the world knowledge that comes from the rest of the data.
Also, if it is only trained on code, it's likely to miss out on all the world knowledge that comes from the rest of the data.