| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ISV_Damocles 1064 days ago

I would say that you do not quite understand it. Part of the process of generating the code that does work is that it also generates a test suite using the examples you provide as the test cases and it actually executes the test suite against the code that was generated and iterates with the LLM until the test suite passes.

This is where the claim that it's tested code comes from, because it is literally tested.

One of the examples we added is a simple tool to get headlines from CNN.com[1]. We don't commit the generated python to the repository because we're treating it as a compiler artifact, but here's a gist[2] of one of the runs, including the test suite it created to validate proper behavior. It's not just relying purely on the LLM's ability to string tokens together, but goes through a validation phase to make sure what it built is real.

[1]: https://github.com/alantech/marsha/blob/main/examples/web/cn... [2]: https://gist.github.com/dfellis/a758a7321b4f62f820ddbad57aac...