| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hansent 209 days ago

p50 latency on roboflow serverless api is 300~400ms roundtrip for sam3 image with text prompt.

You can get an easy to use api endpoint by creating a workflow in roboflow with just the sam3 block in it (and hook up an input parameter to forward prompt to the model), which is then available as an HTTP endpoint. You can use the sam3 template and remove the visualization block if you need just json response for a bit faster latency and smaller payload.

Internally we are getting to run approx ~200ms http roundtrip, but our user facing API currently has some additional latency because we have to proxy a bit to hit a different cluster where we have more GPU capacity for this model allocated than we can currently get on GCP.