|
|
|
|
|
by Deathmax
724 days ago
|
|
Gemini models on Vertex AI can be called via a preview OpenAI-compatible endpoint [1], but shoving it into existing tooling where you don't have programmatic control over the API key and is long lived is non-trivial because GCP uses short lived access tokens (and long-lived ones are not great security-wise). Billing for the Gemini models (on Vertex AI, the Generative Language AI variant still charges by tokens) I would argue is simpler than every other provider, simply because you're charged by characters/image/video-second/audio-second and don't need to run a tokenizer (if it's even available cough Claude 3 and Gemini) and having to figure out what the chat template is to calculate the token cost per message [2] or figure out how to calculate tokens for an image [3] to get cost estimates before actually submitting the request and getting usage info back. [1]: https://cloud.google.com/vertex-ai/generative-ai/docs/multim... [2]: https://platform.openai.com/docs/guides/text-generation/mana... [3]: https://platform.openai.com/docs/guides/vision/calculating-c... |
|