Hacker News new | ask | show | jobs
by bnqscrtm 1177 days ago
A 4s run time for object segmentation at 640x480 sounds like it's not using the GPU at all. Something like that should run on a VGA image in at most a few hundred ms.

For the second part of the question, a 2080 should get you close to 10FPS operation. For a ballpark estimate, using an off-the-shelf repo like Ultralytics's YOLOv5 lets you run object detection (not masking) at something like 100FPS. Masking should not add that much overhead.

w.r.t. GPUs yes, these days more money equals more speed for GPU NN inference, though there are diminishing returns. A 3090 might get you the best bang for your buck these days while still having enough VRAM to run fancier models which may need more than the 12 GiB many other GPUs have.

Finally, I haven't read the paper too carefully but I believe that by prompting they mean that you have the option of describing in human language what you want the model to select, rather than the model being "hardwired" to do this. In other words, you could prompt the model to "segment the red car only" and it would do it, rather than just having the model blindly segment every object in the image, and then relying on custom scripting to potentially post-process these segments.

3 comments

I'm using the first model on the SAM website ( ViT-H SAM).

It's definitely using the GPU- I'm running nvidia-smi and I see near 100% utilization on the GPU while the CPU is using 1 core. If I run the script with --device=cpu then I see my server using 4 cpu cores and no GPU and it takes tens o seconds per image.

I'm trying to check with people who have experience with this specific model.

I've checked on the repo and other folks report the same numbers as me- in fact my 2080 is just as fast as the 3090 (5 seconds per mask generation/image).

I have bounding box already, so I could prompt the model with that, but all of this runs counter to the published performance numbers.

Yeah, a 3090 should do well.

If you want to try it on one reach out to me (email in profile). We rent those out in the cloud. Would allow you to confirm performance before buying one for local use.