| This is neat! Who's the target user? For this to be usable to me (level of knowledge: I can do all of what's done on this page in Python / R, but don't have a PhD in stats or anything), I would need: - Some sense of how training and validation is done - Model weights - Something that helps interpret fitting / overfitting I'm not sure it's super useful for someone below my level of knowledge (or maybe at my level but just don't know Python?). It seems like a random marketing person, shopify person, etc. would need: - Better understanding of when to use regression vs. classification - Help interpreting of whether the MAE / loss is good or bad - Some automatic way to prevent overfitting - Guidance on what consitutes good data and how to structure it for input - Examples of how it might be applied to their use case - Knowledge of how often models fail / how much they should be indexing on the model |
1) The target user is: a freelance marketer, small/medium enterprise without dedicated data scientists, technical people (e.g. engineers) from other fields, or without extensive stats/ml knowledge.
2) RE:lack of hand-holding. This is likely the biggest challenge for this project: showing people how to think about ML, and how to use it to derive value for their business, without going through hours of training or lengthy tutorials.
> - Guidance on what consitutes good data and how to structure it for input
> - Examples of how it might be applied to their use case Most users are stuck at this initial phase (preparing the data).
One thing I'm working now is adding use-case focused guides: short article explaining how a realtor would go about building a model to help them roughly value houses (including data collection). I hope this helps with these two points.
> Help interpreting of whether the MAE / loss is good or bad
There's a couple things I'm working on that might help: 1) Show metric improvement relative to a baseline (e.g. MAE for a model that always predicts the mean). 2) Show both train and test curves. The current curve is only on test data.
> Better understanding of when to use regression vs. classification
> Some sense of how training and validation is done
I'm currently redesigning the UX around a step-by-step flow (for initial users at least), that should give a bit of room to explain things along the way (e.g. what classification/regression means for total beginners).
> Some automatic way to prevent overfitting
Medium-term models there'll be a mode to continuously train models to tune hyperparameters, that should help avoid overfitting. Until then it's mostly handpicked parameters (including regularization), and having tested this on Kaggle challenges it still sometimes beats my hand-written ML code :)
> Model weights
You can already download the model weights (download icon next to the model name) ; or do you mean feature importances? That's a planned feature, but it's not straightforward to implement in a generic way so might take a month or two before it's shipped.