You can't get flops on a Hailo-8, they're fixed-point only. As much as these specialised inference chips are cool, we're a long way from just being able to drop them in where a GPU was. Not to mention the memory is hugely constrained. The Hailo chips I've worked with were all limited to 20MiB for the weights which is a squeeze even at 4-bit.
8-bit ops, inference only, low memory embedded, excluding the host, implied utilization from FPS specs is ~20%
But the trend is there.
There are also newer ADAS/AV units from China which claim 1000tflops and cant really cost more than $1000/$2000 per car.
These are all tiled designed (see also dojo/tesla) heavily over-weighed on flops vs memory
[1] https://www.axelera.ai/
[2] https://hailo.ai/