| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by photon_garden 295 days ago
	Curious how they've assessed quality, either qualitatively or quantitatively. How often do the generated documents miss important parts of the codebase or hallucinate requirements? How often do engineers have to redo work because the LLM convincingly told them to build the wrong thing? You can build real, production-grade systems using LLMs, but these are the hard questions you have to answer.

2 comments

antonybrahin 295 days ago

This is not production ready yet, but based on my preliminary tests, the outputs are about 80% consistent. The plan ofcourse is for the architect to review the specs before getting devs assigned.

link

18cmdick 295 days ago

They haven't.

link

cyanydeez 295 days ago

Yes. It's amazing we've gotten so far with LLM and everyone believing everyone else has actually validated their claims that _their_ LLM is producing valid output.

Essentially, you got a bunch of nergs generating code and believing that because it looks right, that this means every other subject matter being output is also correct.

link

antonybrahin 295 days ago

My target was to reduce the manual work of creating documents, it's definitely a draft, needs to be reviewed by an architect and a QA lead before passing it on. The tasks generated will have the actual actionable task, that can be used for prompting in cursor or vs code.

link

antonybrahin 295 days ago

Yes, it's not tested for large volume yet.

link