A tiny tiny LLM (essentially removing the "Large" part of "Large Language Model"). I taught neural networks to remember wikipedia articles (Actually, just one wikipedia article about horses.) and throw it back as-is by predicting the next token (when given the first token).