|
|
|
|
|
by bnqscrtm
1177 days ago
|
|
A 4s run time for object segmentation at 640x480 sounds like it's not using the GPU at all. Something like that should run on a VGA image in at most a few hundred ms. For the second part of the question, a 2080 should get you close to 10FPS operation. For a ballpark estimate, using an off-the-shelf repo like Ultralytics's YOLOv5 lets you run object detection (not masking) at something like 100FPS. Masking should not add that much overhead. w.r.t. GPUs yes, these days more money equals more speed for GPU NN inference, though there are diminishing returns. A 3090 might get you the best bang for your buck these days while still having enough VRAM to run fancier models which may need more than the 12 GiB many other GPUs have. Finally, I haven't read the paper too carefully but I believe that by prompting they mean that you have the option of describing in human language what you want the model to select, rather than the model being "hardwired" to do this. In other words, you could prompt the model to "segment the red car only" and it would do it, rather than just having the model blindly segment every object in the image, and then relying on custom scripting to potentially post-process these segments. |
|
It's definitely using the GPU- I'm running nvidia-smi and I see near 100% utilization on the GPU while the CPU is using 1 core. If I run the script with --device=cpu then I see my server using 4 cpu cores and no GPU and it takes tens o seconds per image.
I'm trying to check with people who have experience with this specific model.