| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by george3d6 1946 days ago

Regrading benchmarks, we have three main dataset collections we focus on currently:

1. Datasets from customers, but obviously those can’t be made public.

2. The OpenML benchmark, which is fairly limited because it’s mainly binary categories, but which is good because it’s a 3rd party, so unbiased. We have some intermediary results here (https://docs.google.com/spreadsheets/d/1oAgzzDyBqgmSNC6g9CFO...) , they are middle-of-the-road. However I think the benchmark is pretty limited, i.e. it doesn’t cover most of the kinds of inputs and almost none of the output we support

3. An internal benchmark suite which currently has 59 datasets, mainly focused around classification and regression tasks with many inputs, timeseries problems and text. Some part of it is public but opening that up is a bit difficult due to licensing issues. I’m hoping that in the next year it will grow and 90%+ of it can be made public. We benchmarkagainst older versions of mindsdb, against hand made models we try to adapt to the task, against the state of the art accuracy for the dataset (if we can find it) and a few other auto ML frameworks (well, 1, but I hope to extend that list) [see this repo for the ones we made public: https://github.com/mindsdb/benchmarks, but I'm afraid it's a bit outdated]

That being said benchmarking for us is still WIP, since as far as I can tell nobody is trying to build open source models that are as broad as what we're currently doing (for better or worst), and the closed source services offered by various IaaS providers don't really come with public benchmark results outside of marketing.

1 comments

cweill 1946 days ago

The benchmarking challenges you are facing are pretty common in the AutoML community. My colleagues and I at Google Research are trying to solve this with https://github.com/google/nitroml. It's still super early days (no CI yet), but I think it could help your team benchmark on a set of open standard benchmark tasks as we open source more of the system.

link

george3d6 1946 days ago

Looks quite interesting, already pinned this in the relevant slack channel :)

To be honest I'm rather happy with how the internal benchmark suite is turning out, but to some extent you are inviting bias by creating them yourself. On top of that, it doesn't hurt to have more benchmarks.

At the end of the day it's a combination of: * How much work is it to integrate (easy to measure) * How visible is it, i.e if we actually find something interesting will be visible and legible to others (ify to mesure, citations, stars, etc are some invitation) * How useful it is to "improve" the library (hard to measure, and what we aim to be good at is a moving target)

So realistically that's the equation I have to judge in terms of adding a new benchmarks suite, and it's very annoying because you'll note the most important things are the hardest to measure.

Would you want people to integrate with this now or would you rather wait a few weeks/months/years until it matures more? If the former, can you give a few details regrading where to start (README is fairly barren), if the later please ping me (george.hosu@mindsdb.com) when you think it could be ready to try.

Anyway, any open benchmark library is a step in the right direction, thanks for working on this :)

link

cweill 1946 days ago

Thanks for your feedback! Based off the description of how you already do things, I'd say you're ahead of the curve as far as rigorous model quality benchmarking. You should absolutely hold off of using nitroml for a few months until it's more mature. It's very much pre-prerelease in a build-in-the-open sense. :) I'll shoot you an email once it's ready for anyone to try out. When the time comes, we'll have a blog post to announce it, and will include proper documentation.

And, congrats on the launch!

link