We're actually very happy with Comet and have been using it on v large projects (>50 researchers, 10k models). You can reduce the refresh interval and the amount of data points reported if things feel slow
I don't log that many points as it is: about 4K data points per run in total (windowed average loss and LR every 25-30 batches, eval metrics every epoch), for all metrics combined. I also log the same data to TensorBoard, which renders everything pretty much instantaneously with no issues at all, even though I tell it to not downsample beyond 5K samples per graph.