| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by astrange 1399 days ago
	Generating videos and 3D models is _much_ more difficult than images. You can’t just train off videos from the internet in the same way, because they don’t have sufficient text labels to understand them like CLIP does.

2 comments

blueblisters 1399 days ago

Oh but they have sound which can be annotated much faster/more efficiently. You also potentially have screenplay but the amount of training data is probably too less and sparse.

FWIW, I don’t think the AI systems will generate a whole video by itself - it’ll be some form of image to image generation where an artist will render a rough sketch of the scene and the AI will fill in the details, frame by frame.

link

bottlepalm 1398 days ago

I mean at this point it’s really not a question of if, but when - right?

link

namose 1399 days ago

I wonder if subtitles could be used, so rather than describing the video, you just write a script and it generates video for you. I'm certainly no expert, but it does seem like there's a lot more data there.

link