Nice work! Here is an article you may find helpful if you have not already come across it.[0]. You may also want to consider benchmarking against some non ML methods.[1]
You solve a dataset when you learn what there is to learn about the phenomenon of interest. The limit of such phenomenon is “cure all disease”, and clearly this is not solving that.
MNIST (the number classification task) has been “solved” a billion times and it is hard to imagine any subsequent advances there as scores using a variety of methods have hit the saturation point of accuracy. Any further improvements are likely overfitting to noise. Therefore, we know that it is easy to detect handwritten numbers. However, we may not know how to detect other things as well, like reading an MRI. Those datasets/tasks are clearly different and require different techniques. Training an LLM is likewise different.
If it was really solved, wouldn't it just need to happen once?
You think classifying handwriting of 10 numbers is the same as this that took 55 hours of GPU time for someone to go through?
I have no idea what point you're trying to make and I can't tell if you do either. You were talking about "solving" other "health datasets" but you can't even come up with one or what that means.
0. https://pubmed.ncbi.nlm.nih.gov/35318324/
1. https://www.nature.com/articles/s41586-023-06127-z