Hacker News new | ask | show | jobs
by DEEPAN_C 82 days ago
Iam so glad you replied, but the issue is only gemini has the particular state of art model that can embed all the multi modality inputs into same vector space.Other models cannot embed pics, texts,audio in same vector space,inorder to achieve we need multi pipeline arch, so if I swap gemini with something local models latency is increased, but I can achieve full local 24/7 working and this api is a free tier.