| That JSON isn't something you'd type, it's something that you can programmatically generate if you have a Home Assistant setup. With super primitive wake word detection and transcription, the most you get is: - What the user said - How loudly each microphone in the house heard it. If you take a look at the mock object in that transcript, that's what it maps to... ```json
{
"request": "I'm finding it hard to read"
"observedRequestVolume": [
3eQEg: 30,
iA0TN: 60,
h1T3y: 59,
5Qg1M: 10
]
}
``` The only part that would be human provided is: "I'm finding it hard to read" The invented challenge was to see if using a suboptimal set of inputs (we didn't tell it where we are) it can figure out how to action. It's zero-shot capability that makes LLMs suitable for assistants: traditional assistants can barely handle being told to do something they're capable of in the wrong word order, while this can go from hastily invented representation of a house and ambiguous commands to rational actions with no prior training on that specific task |