AFAIK, the 'Look Once' part refers to other systems that re-ran a section of the frame at a time through an object detector, resulting in a lot of reprocessing.
You could still look only once, but have that look include multiple sequential frames. Or do something like an LSTM of frames.
You could still look only once, but have that look include multiple sequential frames. Or do something like an LSTM of frames.