Hacker News new | ask | show | jobs
by sdevonoes 17 days ago
I don’t think that’s true based on experience. Maybe “<“ instead of “<<“, yeah. But even in that case, it’s an awful trade off for any serious codebase that needs to be maintained over the years (and you don’t know what LLMs are gonna look like next year, so there are zero guarantees all your MD is gonna work as good as it’s “working” right now)
1 comments

As long as LLMs remains at the same skill level at coding, or better, there's 100% guarantee an MD (a glorified prompt) is gonna work as good as it’s “working” right now.
This is quite a claim without any evidence to substantiate it. LLMs are nondeterministic models, whose behaviour is reliant on training data, model architecture and context (both in the general and domain specific sense).

There is absolutely no guarantee llm1(MD) == llm2(MD), by design. With the current batch you need to explicitly constrain a number of parameters, far more than simply the prompt, to get identical output from the _same_ model, let alone another model that has varied training data and/or architecture.

Models are not innately backwards-compatible. Both OpenAI and Anthropic encourage running evaluations and comparing the performance of your existing agent workflows against new models before just stepping up to the newest one because you may encounter regressions. I myself have seen lengthy/long-horizon multi-agent workflows begin breaking after moving to a newer model because for some reason the prompt containing an instruction to call a tool that worked 99/100 times before suddenly just stops working and needs to be modified.