|
|
|
|
|
by jparkerholder
849 days ago
|
|
Author here :) Re: 1) typically no, but of course it can hallucinate just like LLMs. 2) Agreed but the key point missing is Dreamer is trained from an RL environment with action labels. Genie is trained exclusively from videos and learns an action space. This is the first version of something that is now possible and will only improve with scale. |
|
Anyway, about my second question: why are the videos only half second ish long? Does the model unravel after that?
Also
> This is the first version of something that is now possible and will only improve with scale.
11b params is already pretty large considering the stable diffusion and LLM scale. How much higher do we need to scale until we get something useful beyond simple setups?