Y
Hacker News
new
|
ask
|
show
|
jobs
by
numba888
453 days ago
Great, but how do you imagine multimodal with text, video. Just 2 for simplicity, what will be in the training set. With text model tries to predict next, then more steps were added. But what to do with multimodal?