| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by insane_dreamer 67 days ago

> A system where a human is course correcting based on signals/sensor data

the human isn't course correcting; the agent is course correcting based on the feedback data; the human is just inputting the feedback data to the agent in cases where the agent isn't able to access that data itself (due to the tooling not yet being in place for such)

data leakage would be the following:

  - agent makes a prediction for problem A based on training data 
  - feedback from the result is fed back to the agent 
  - agent regenerates a prediction for problem A, incorporating the feedback

but in this case:

  - agent makes a decision on Problem A based on training data 
  - feedback from the decision is fed back to the agent 
  - agent makes a decision for Problem B (not revisiting Problem A), a new Problem that is dependent on the outcome of Problem B