| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ISV_Damocles 1057 days ago

I think you are painting with too broad of a brush. There are many domains that I would never use an LLM-based tool for; all tools can be used incorrectly, but that doesn't make the tool at fault.

Software engineering is about trade-offs, for LLM-based code generation in general the trade-off is speeding up the writing of code at the expense of precision in what is generated. When you use something like Copilot it uses the comment or function signature to "guess" what you intend to write, and sometimes it right, sometimes it's not.

Marsha is exploring that trade-off space. Copilot finishes in 5-20 seconds, usually, while Marsha's slower, sometimes as fast as 20 seconds, but usually a little over a minute. The syntax requires you to provide more information up front than just a comment or a function signature and also uses that up front information to generate a test suite to improve the reliability of what it outputs, which increases iterations with the LLM and therefore slows it down.

Only when the code generated passes the test suite will it actually return an output to you, so the code it generated passes the cases that you were able to think of, which should make it much more precise than Copilot. That may still fail, but probably in ways your own code would have failed for cases you hadn't considered, so this particular trade-off feels closer to "free" versus writing it up by hand, in my opinion.

But again, when to use the tool is a decision you must make. You can see from our own examples that we've only used it so far on toy problems or problems small enough that manual review is feasible. Since the test suite always passes at the end (or it simply fails to generate if not), that makes it better than many Eng I and some Eng II level engineers I have worked with in the past. ;)