This looks really cool! I'm curious if you've considered swapping Gemini out for a local LLM alternative to keep costs down, especially for running it 24/7.
Iam so glad you replied, but the issue is only gemini has the particular state of art model that can embed all the multi modality inputs into same vector space.Other models cannot embed pics, texts,audio in same vector space,inorder to achieve we need multi pipeline arch, so if I swap gemini with something local models latency is increased, but I can achieve full local 24/7 working
and this api is a free tier.