|
|
|
|
|
by tyre
493 days ago
|
|
What if you have it read the script, then say, “hey, at this point, what is the character feeling? What are they trying to accomplish? What is there relationship to each person in the scene?” And then you get that and prompt the model to add inflection and pacing and whatever to the text to reflect that. You feed that into the speech model. It seems like it could definitely do the first part (“based on this text, this character might be feeling X”); the second part (“mark up the dialogue”) seems easier; the third part about speech seems doable already based on another comment. So we are pretty close already? Whatever actors are doing can be approximated through prompting, including the director iterating with the “actors”. |
|
Sure, but now how do you make sure all the answers to those questions are consistent? Across clauses, sentences, paragraphs? To do that, you need to have an entire understanding of human psychology.
And I haven't seen any evidence that LLM's possess that kind of knowledge at all, except at the most rudimentary level of narrative.
Just think of how even professional directors struggle to communicate to an actor the emotional and psychological feeling they're looking for. We don't even have words or labels for most of the things, and we say "you know how you feel in a situation when <a> and <b> but <c>? You know that thing? No, not that, but when <d>. Yeah, that." Most of these things operate on an intuitive, pre-verbal level of thinking in our brain. I don't think LLM's are anywhere close to being able to capture that stuff yet.