It's limiting to not be able to call it through routers like LiteLLM & to make a new billing account
Not to mention local- these are presumably small models and I'd take 800 tokens/sec vs 4000/sec with latency any day
It's limiting to not be able to call it through routers like LiteLLM & to make a new billing account
Not to mention local- these are presumably small models and I'd take 800 tokens/sec vs 4000/sec with latency any day