Hacker News new | ask | show | jobs
by coldtea 26 days ago
As long as LLMs remains at the same skill level at coding, or better, there's 100% guarantee an MD (a glorified prompt) is gonna work as good as it’s “working” right now.
2 comments

This is quite a claim without any evidence to substantiate it. LLMs are nondeterministic models, whose behaviour is reliant on training data, model architecture and context (both in the general and domain specific sense).

There is absolutely no guarantee llm1(MD) == llm2(MD), by design. With the current batch you need to explicitly constrain a number of parameters, far more than simply the prompt, to get identical output from the _same_ model, let alone another model that has varied training data and/or architecture.

Models are not innately backwards-compatible. Both OpenAI and Anthropic encourage running evaluations and comparing the performance of your existing agent workflows against new models before just stepping up to the newest one because you may encounter regressions. I myself have seen lengthy/long-horizon multi-agent workflows begin breaking after moving to a newer model because for some reason the prompt containing an instruction to call a tool that worked 99/100 times before suddenly just stops working and needs to be modified.