Except most LLMs are not deterministic, the same prompt will, in general, produce different outputs because the sentence completion ist statistical and randomly selected
What matters is if the current output after each request complies with the given specifications, and if it's possible to solve the bugs until the code converges into stability.
What matters is if the current output after each request complies with the given specifications, and if it's possible to solve the bugs until the code converges into stability.