Hacker News new | ask | show | jobs
by jansan 943 days ago
There is a paper about something called Q*. I have no idea if they are connected or if the name matched coincidentially.

https://arxiv.org/abs/2102.04518

1 comments

The real world is a space of continuous actions. To this day Q algorithms have been ones of discrete action outputs. I'd be surprised if a Q algorithm could handle the huge action space of language. Honestly its weird they'd consider the Q family. I figured we were done with that after PPO performed so well.