Hacker News new | ask | show | jobs
by m0zg 2400 days ago
As one of the long-suffering Comet.ml customers, I wish they'd spend more time working on their site's performance and less on writing blog posts. It takes multiple seconds for graphs to render, and leaving any part of Comet.ml UI open in the browser leads to spinning fans and quick battery drain when working from a laptop. The logging component will sometimes hang without a warning and hang your training session as well. Bizarrely, there's no way to show min/max metric values for ongoing and completed runs (AKA the only thing a researcher actually cares about): you have to log them separately in order to display them.

This is a weird field: these are not difficult problems to solve, yet as far as I can tell, all of the popular choices available so far each suck in their own unique way and there's no option that I know of that actually offers convenience and high performance. FOSS options are barely existent, as well, and they also suck.

For the things where Comet.ml would be too onerous to deal with, I still use pen and paper.

4 comments

Hi M0zg! Gideon from Comet- sorry to hear you're having issues. Did you every try to report these? if you share more info at support@comet.ml or at our slack channel i'm sure we can fix it / improve. On a general note: 1. you can see min/max values in the metrics tab for finished/running experiments. 2. we spend tons of time on performance but these are actually difficult problems to solve, i.e if you have ten charts all showing 10k data points all updating in real time. That said if you share your project we can use it to improve. Finally the SDK is designed to never crash or slow down your training and this is the first time we've heard that complaint - again please ping us so we can figure out what's going on.
We're actually very happy with Comet and have been using it on v large projects (>50 researchers, 10k models). You can reduce the refresh interval and the amount of data points reported if things feel slow
I don't log that many points as it is: about 4K data points per run in total (windowed average loss and LR every 25-30 batches, eval metrics every epoch), for all metrics combined. I also log the same data to TensorBoard, which renders everything pretty much instantaneously with no issues at all, even though I tell it to not downsample beyond 5K samples per graph.
M0zg do you mind sending me an email with your project? Happy to look into it. gideon a t comet.ml
Also keep in mind that unlike tensorboard we keep your full data series available in the API and only downsample the charts to 15k points.
When we did our evaluation comet was far superior to the alternatives and we’re not seeing any of the issues you reported. For better performance make sure you don’t log every step but rather every epoch.
I'd love to learn more about your use case. What kind of models are you training? What are you using Comet.ml for?

Thanks!