| I remain hopeful that some day someone will train an LLM which is tolerable to people who take this stance (which I respect, much like I respect food vegetarians despite not being one myself). I've been tracking models trained entirely on out-of-copyright data, for example. I've not yet seen one of those which appears generally useful and didn't chuck in a scrape of the web or get fine-tuned on examples generated by a non-vegetarian model. Andrej Karpathy can train a GPT-2 class model for less than $80 now, so at least the environmental cost of training may drop to a point that it's acceptable to LLM vegetarians: https://twitter.com/karpathy/status/2017703360393318587 Why do I care? This post is a great example. If you're a professor of computer science I really want you to be able to tinker with this fascinating class of models without violating your principles. UPDATE: Huh, speaking of potentially vegetarian models, I just saw https://talkie-lm.com/introducing-talkie on the HN homepage https://news.ycombinator.com/item?id=47927903 I've explored I different out-of-copyright trained model Mr Chatterbox before but found it to have been mildly corrupted through the help of synthetic conversation pairs from Haiku and GPT-4o-mini - https://simonwillison.net/2026/Mar/30/mr-chatterbox/ Talkie isn't entirely pure either though: "Finally, we did another round of supervised fine-tuning, this time on rejection-sampled multi-turn synthetic chats between Claude Opus 4.6 and talkie, to smooth out persistent rough edges in its conversational abilities." |