The brief of it is by curating a smaller synthetic dataset of high quality from textbooks, problem sets, etc. instead of dumping a massive dataset with tons of information.