Another similar possibility might be to do more RL with this data, e.g. using upside-down RL. One can possibly steer this with user feedback as well.