Hacker News new | ask | show | jobs
by arctic-true 128 days ago
This is super interesting. But I have to wonder how much it costs on the back end - it sounds like it’s essentially just running a boatload of specialized agents, constantly, throughout the whole interaction (and with super-token-rich input for each). Neat for a demo, but what would it cost to run this for a 30 minute job interview? Or a 7 hour deposition?

Another concern I’d have is bias. If I am prone to speaking loudly, is it going to say I’m shrill? If my camera is not aligned well, is it going to say I’m not making eye contact?

1 comments

So the conversational agent runs on a provisioned chunk of compute already, but that chunk isn't utilized to 100% of its provisioned capacity. For this perception system we're taking advantage of the spare compute left on what's provisioned for a top-level agent, so turning this on costs nothing "extra"

Bias is a concern for sure, though it adapts to your speech pattern and behaviors in the duration of a single conversation, so ack'ing you not making eye contact because say your camera is on a different monitor, it'll make the mistake once and not refer to that again.