If you do it naively your video frames will buffer waiting to be consumed causing a memory leak and eventual crash (or quick crash if you’re running on a device with constrained resources).
You really need to have a thread consuming the frames and feeding them to a worker that can run on its own clock.
I'm on windows.
Ideally I'd like the frames to be dropped, so the inference is done on the last received frame? Is this a standard behaviour?