Hacker News new | ask | show | jobs
by photon_garden 295 days ago
Curious how they've assessed quality, either qualitatively or quantitatively. How often do the generated documents miss important parts of the codebase or hallucinate requirements? How often do engineers have to redo work because the LLM convincingly told them to build the wrong thing?

You can build real, production-grade systems using LLMs, but these are the hard questions you have to answer.

2 comments

This is not production ready yet, but based on my preliminary tests, the outputs are about 80% consistent. The plan ofcourse is for the architect to review the specs before getting devs assigned.
They haven't.
Yes. It's amazing we've gotten so far with LLM and everyone believing everyone else has actually validated their claims that _their_ LLM is producing valid output.

Essentially, you got a bunch of nergs generating code and believing that because it looks right, that this means every other subject matter being output is also correct.

My target was to reduce the manual work of creating documents, it's definitely a draft, needs to be reviewed by an architect and a QA lead before passing it on. The tasks generated will have the actual actionable task, that can be used for prompting in cursor or vs code.
Yes, it's not tested for large volume yet.