Hacker News new | ask | show | jobs
by george3d6 1944 days ago
> So I assume that you are doing hyperparameter search? Can you share what optimization method you are using for search (e.g. random, gp )?

Short answer is optuna and ax but only sometimes.

Long answer lead me down a rabbit whole and it's 10k+ words and a few experiments deep. If you're interested in this are specifically ping me, but I've got nothing concrete, however I like discussing it. A recent paper I saw that somewhat echos my thoughts is: https://arxiv.org/pdf/2102.03034.pdf | but some bits feel either over my head and/or overly pedantic and/or overly formal | and I'm not sure I agree with the conclusion | and loads of it is irrelevant. But if the problem interests you I'd suggest giving it some time, with those disclaimers in mind

> Also, is the search can be distributed in parallel to multi node ?

Theoretically yes, practically it's still WIP to get this to work, but the architecture we have right now is very much conceived with massive distribution in mind (see our docs for more details on that).

> And, if mindsdb is not part of the db, what happen if minddb fail ?

The select query you use to make a prediction returns an error, essentially. Assuming you mean "what happens if it crashes or if the model you are using crashes?".

e.g:

psql> SELECT diagnostic FROM mindsdb.flu_detector WHERE headache=true AND temperature=37.5 AND cough='mild';

psql> Error: External table returned error: "Segfault"

OR

psql> SELECT diagnostic FROM mindsdb.flu_detector WHERE headache=true AND temperature=37.5 AND coughsfsagsa='mild';

psql> Error: External table returned error: Input column `coughsfsagsa` doesn't exist

(or something like that)

> Also, do you support automatic retraining?

Not at the moment, but we're going to add it very soon, with the first implementation allowing retraining with a certain user-set frequency (e.g. once every 2 hours).

Which will allow the model to be always fresh as new data comes in (assuming there's no time limit on the query)

1 comments

Wow. Thanks for the answer and for the paper !. I myself implemented this: https://arxiv.org/pdf/1810.05934.pdf in go.

The issue with retraining is that you need new labels (assume supervised ML). so I wonder what process do you use to get those.