| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 705 days ago

What I meant is

1. LLM generates an idea and

2. the user responds positively or negatively or

3. the user tries the idea and comes back to continue the iteration, communicating the outcomes.

For example the LLM generates some code and I run it, and if it fails I copy paste the error.

That is the (state, action, reward) tuple which defines an experience.

1 comments

jpc0 704 days ago

Sounds like the LLM facilitated a human to gain experience, by making mistakes for the human and then correcting those mistakes also likely in an incorrect way. LLMs are effectively very very bad teachers.

The LLM given the same inputs tomorrow is likely to return similar responses. If a human did that they would likely be concidered to have some sort of medical condition...

link