Hacker News new | ask | show | jobs
by isaacfung 855 days ago
There are lots of video content with audio. We can train a facial expression classification model to detect the speaker's emotion(we can also use a multimodal model to take in consideration of the language context).

Another potential source of data is voice acting script of animations. I always thought the storyboards of films/animations can be great annotated training data but it seems there are no open datasets, probably because of copyright issues.