Y
Hacker News
new
|
ask
|
show
|
jobs
by
mt_
361 days ago
You can just dump the youtube link video in Google AI studio and ask it to transcribe the video with speaker labels and even ask it it to add useful visual clues, because the model is multimodal for video too.
1 comments
MaxDPS
361 days ago
Can I ask what you mean by “useful visual clues”?
link
mt_
361 days ago
What is the speaker showcasing in its slides, what is it's body language and so on.
link