Don't describe it as "fast". Use "latency" or "throughput". RC is a useful building block for making real-time latency guarantees, and for many apps that is more important than peak throughput.
You don't have to clean up everything at once - it's possible to put items on a queue to be cleaned up when refcounts hit zero, and you can then control at what rate the queue is emptied.
Which makes it already far too complicated, more complex than a real time mark&sweep. Especially if there are loop checks - you cannot just empty the queue gradually, it's all or nothing.
Just imagine it freeing a 10Gb linked list.
Real time pure mark and sweep GCs are easier.