Hacker News new | ask | show | jobs
by streetcat1 1949 days ago
Hi

So I assume that you are doing hyperparameter search? Can you share what optimization method you are using for search (e.g. random, gp )?

Also, is the search can be distributed in parallel to multi node ?

And, if mindsdb is not part of the db, what happen if minddb fail ?

Also, do you support automatic retraining? If yes, can you elaborate more?

2 comments

These are amazing questions Streetcat, We do some hyperparameter search using Optuna, we may be moving to Ray Tune because it can be highly parallelized. If MindsDB fails, it depends on how various DBs manage federated storage, but essentially you will get a query error. Funny that you mention automatic retraining, people have been asking for this recently, we will be supporting a retrain_frequency parameter in the coming releases, would you like to give it a test drive?
I am actually working on a product in the same area (auto ml/ mlops ) as a non YC startup... We might be able partner. I am not sure how to reach you?
Send us an email - Adam at MindsDB.com and Jorge at MindsDB.com
absolutely lets connect!! jorge at mindsdb
> So I assume that you are doing hyperparameter search? Can you share what optimization method you are using for search (e.g. random, gp )?

Short answer is optuna and ax but only sometimes.

Long answer lead me down a rabbit whole and it's 10k+ words and a few experiments deep. If you're interested in this are specifically ping me, but I've got nothing concrete, however I like discussing it. A recent paper I saw that somewhat echos my thoughts is: https://arxiv.org/pdf/2102.03034.pdf | but some bits feel either over my head and/or overly pedantic and/or overly formal | and I'm not sure I agree with the conclusion | and loads of it is irrelevant. But if the problem interests you I'd suggest giving it some time, with those disclaimers in mind

> Also, is the search can be distributed in parallel to multi node ?

Theoretically yes, practically it's still WIP to get this to work, but the architecture we have right now is very much conceived with massive distribution in mind (see our docs for more details on that).

> And, if mindsdb is not part of the db, what happen if minddb fail ?

The select query you use to make a prediction returns an error, essentially. Assuming you mean "what happens if it crashes or if the model you are using crashes?".

e.g:

psql> SELECT diagnostic FROM mindsdb.flu_detector WHERE headache=true AND temperature=37.5 AND cough='mild';

psql> Error: External table returned error: "Segfault"

OR

psql> SELECT diagnostic FROM mindsdb.flu_detector WHERE headache=true AND temperature=37.5 AND coughsfsagsa='mild';

psql> Error: External table returned error: Input column `coughsfsagsa` doesn't exist

(or something like that)

> Also, do you support automatic retraining?

Not at the moment, but we're going to add it very soon, with the first implementation allowing retraining with a certain user-set frequency (e.g. once every 2 hours).

Which will allow the model to be always fresh as new data comes in (assuming there's no time limit on the query)

Wow. Thanks for the answer and for the paper !. I myself implemented this: https://arxiv.org/pdf/1810.05934.pdf in go.

The issue with retraining is that you need new labels (assume supervised ML). so I wonder what process do you use to get those.