It's using OpenAI's API at the moment, actually. An offline model could _probably_ handle the conversation and tool calling, but it just needs to be really fast to keep up with conversational speeds. (And really, GPT-4o is a bit too slow for my liking in this current iteration. I'm hoping that GPT-4.5 will be faster.)
I'm writing up a full accounting of the stack for the post above, so check back for that and let me know if that doesn't answer your questions/concerns!
I'm writing up a full accounting of the stack for the post above, so check back for that and let me know if that doesn't answer your questions/concerns!