|
|
|
|
|
by mschild
121 days ago
|
|
I find that with more complex projects (full-stack application with some 50 controllers, services, and about 90 distinct full-feature pages) it often starts writing code that simply breaks functionality. For example, had to update some more complex code to correctly calculate a financial penalty amount. The amount is defined by law and recently received an overhaul so we had to change our implementation. Every model we tried (and we have corporate access and legal allowance to use pretty much all of them) failed to update it correctly. Models would start changing parts of the calculation that didn't need to be updated. After saying that the specific parts shouldn't be touched and to retry, most of them would go right back to changing it again. The legal definition of the calculation logic is, surprisingly, pretty clear and we do have rigorous tests in place to ensure the calculations are correct. Beyond that, it was frustrating trying to get the models to stick to our coding standards. Our application has developers from other teams doing work as well. We enforce a minimum standard to ensure code quality doesn't suffer and other people can take over without much issue. This standard is documented in the code itself but also explicitly written out in the repository in simple language. Even when explicitly prompting the models to stick to the standard and copy pasting it into the actual chat, it would ignore 50% of it. The most apt comparison I can make is that of a consultant that always agrees with you to your face but when doing actual work, ignores half of your instructions and you end up running after them to try to minimize the mess and clean up you have to do. It outputs more code but it doesn't meet the standards we have. I'd genuinely be happy to offload tasks to AI so I can focus on the more interesting parts of work I have, but from my experience and that of my colleagues, its just not working out for us (yet). |
|
There's still the risk that the agent will try to modify the QA systems themselves, but that's why there will always be a human in the loop.