|
|
|
|
|
by ilaksh
512 days ago
|
|
The best models are not language models but multimodal. The grounding of language in video data, new model architectures, and larger models will improve the robustness. That HeyGen video does not suck. It's actually kind of hard to even tell it's AI if you are only looking at it for a few seconds. The interesting thing comparing a human's learning to an AI is that AI skills and knowledge can be copied basically infinitely, whereas a human is a one of a kind. I imagine some parents are putting in efforts with the goal of raising the most productive member of society they possibly can. AI teams have somewhat similar goals for the models they are training. We could see AI take control of the planet within the next four years in order to end WWIII. We should just hope that they keep lots of us around in giant people zoos. |
|
In terms of the HeyGen vid, it's passable, but that was something I literally whipped up in 10 minutes. You can make ones that are much, much better if you invest in creating better training material. The voice and video model in this case only used the one 3-minute source video.
Funny you mention the "people zoo" thing. That's actually part of a sci-fi story I've been trying to write since I was in my teens. Roughed out here: https://youtu.be/2KLdaVs_ugw