|
|
|
|
|
by colechristensen
60 days ago
|
|
It's a struggle to get LLMs to generate tests that aren't entirely stupid. Like grepping source code for a string. or assert(1==1, true) You have to have a curated list of every kind of test not to write or you get hundreds of pointless-at-best tests. |
|
LLMs seem to also avoid checking the math of the simulator. In CFD, this is called verification. The comparisons are almost exclusively against experiments (validation), but it's possible for a model to be implemented incorrectly and for calibration of the model to hide that fact. It's common to check the order-of-accuracy of the numerical scheme to test whether it was implemented correctly, but I haven't seen any vibe coders do that. (LLMs definitely know about that procedure as I've asked multiple LLMs about it before. It's not an obscure procedure.)