At the same time the classifier is improving, the generative models are improving. It’s a classic arms race and this equilibrium is not likely to shift much either way. We are talking about models that approximate human behavior with a high degree of accuracy, I think the goal would be to make them indistinguishable in any meaningful way.
I don't think that this is something that can change through tech advances for the classifiers - in all cases the classifier is just flagging for investigation, it's not sufficient for any action. For plagiarism, appropriate evidence comes from a person comparing the submission with the possible source of plagiarism. For this one, the proper evidence would require getting confirmation that the student actually generated that data - e.g. identifying the exact tool and prompt that was used, or logs from the students' computer showing that this was done, or logs from the text generation service provider. All of those are quite tricky to get and perhaps even not possible.