Someone tried this, I saw it one of the Reddit AI subs. They were training a local model on whatever they could find that was written before $cutoffDate.
I think this is a meta-allusion to the theory that human consciousness developed recently, i.e. that people who lived before [written] language did not have language because they actually did not think. It's a potentially useful thought experiment, because we've all grown up not only knowing highly performant languages, but also knowing how to read / write.
However, primitive languages were... primitive. Where they primitive because people didn't know / understand the nuances their languages lacked? Or, were those things that simply didn't get communicated (effectively)?
Of course, spoken language predates writings which is part of the point. We know an individual can have a "conscious" conception of an idea if they communicate it, but that consciousness was limited to the individual. Once we have written language, we can perceive a level of communal consciousness of certain ideas. You could say that the community itself had a level of shared-consciousness.
With GPTs regurgitating digestible writings, we've come full circle in terms of proving consciousness, and some are wondering... "Gee, this communicated the idea expertly, with nuance and clarity.... but is the machine actually conscious? Does it think undependably of the world, or is it merely a kaledascopic reflection of its inputs? Is consciousness real, or an illusion of complexity?"
Found the GitHub: https://github.com/haykgrigo3/TimeCapsuleLLM