| I do a lot of work in a rather obscure technology (Kamailio) with an embedded domain-specific scripting language (C-style) that was invented in the early 2000s specifically for that purpose, and can corroborate this. Although the training data set is not wholly bereft of Kamailio configurations, it's not well-represented, and it would be at least a few orders of magnitude smaller than any mainstream programming language. I've essentially never had it spit out anything faintly useful or complete Kamailio-wise, and LLM guidance on Kamailio issues is at least 50% hallucinations / smoking crack. This is irrespective of prompt quality; I've been working with Kamailio since 2006 and have always enjoyed writing, so you can count on me to formulate a prompt that is both comprehensive and intricately specific. Regardless, it's often a GPT-2 level experience, or akin to running some heavily quantised 3bn parameter local Llama that doesn't actually know much of anything specific. From this one, can conclude that a tremendous amount of reinforcement for the weights is needed before the LLM can produce useful results in anything that isn't quasi-universal. I do think, from a labour-political perspective, that this will lead to some guarding and fencing to try to prevent one's work-product from functioning as free training for LLMs that the financial classes intend to use to displace you. I've speculated before that this will probably harm the culture of open-source, as there will now be a tension between maximal openness and digital serfdom to the LLM companies. I can easily see myself saying: I know our next commercial product (based on open-source inputs) releases, which are on-premise for various regulatory and security reasons, will be binary-only; I have never customers looking through our plain-text scripts before, but I don't want them fed into LLMs for experiments with AI slop. |