| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gengstrand 688 days ago

This piece reminds me of something I did earlier this year https://www.infoq.com/articles/llm-productivity-experiment/ where I conducted an experiment across several LLMs but it was a one-shot prompt about generating unit tests. Though there were significant differences in the results, the conclusions seem to me to be similar.

When an LLM is prompted, it generates a response by predicting the most probable continuation or completion of the input. It considers the context provided by the input and generates a response that is coherent, relevant, and contextually appropriate but not necessarily correct.

I like the crowdsourcing metaphor. Back when crowdsourcing was the next big think in application development, there was always a curatorial process that filters out low quality content then distills the "wisdom of the crowds" into more actionable results. For AI, that would be called supervised learning which definitely increases the costs.

I think that unbiased and authentic experimentation and measurement of hallucinations in generative AI is important and hope that this effort continues. I encourage the folks here to participate in that in order to monitor the real value that LLMs provide and also as an ongoing reminder that human review and supervision will always be a necessity.

1 comments

derefr 688 days ago

For coding problems specifically, you could get quite far by giving the model a the tool-use of a sandboxed compiler/interpreter (perhaps even with your project files already loaded into the sandbox); and then training the model to test its own proposed solutions in the sandbox and revise them until they actually produce the expected outputs.

link