|
|
|
|
|
by nl
2661 days ago
|
|
This claim (3x faster than TF serving) and the metrics on the site (~500 predictions per second vs ~200 for TF serving) seem more a function of scaling than any technology. Given that you can horizontally scale model prediction infinitely the only sensible way to compare is to include price. I agree that this looks compelling while it is free! But will it be price competitive later? And if price competitiveness is claimed, then how is it possible? Yes, you can do the whole spot instance thing, but that is difficult to make reliable enough at scale. |
|
You can always download the entire panini in your own private server and not pay anything. Ie. used Helm to install in your own kubernetes or DockerHub. For now, We're making it free for models under 2GB. Our main goal is to make it usable and we don't want cost to be a factor.