Hacker News new | ask | show | jobs
by gmisra 3624 days ago
Maybe the prevalent mental model regarding models is wrong? The real elephant in the room is the idea that "all models" are similar, and therefore occupy the same space regarding interpretability, forecasting, and other uses. Models are designed to solve specific problems, usually within specific scopes. Confusion, misinterpretation, and error occur when consumers of models don't understand (or simply ignore) the limitations of these models.

Whenever I sit down to build a new model, or work with a new-to-me model, I start with questions like:

- What questions was this model built to answer? Be careful when asking other questions, and know the limitations of those answers.

- If it's been in use for a while, how effective has it been? How is that efficacy measured? A highly accurate decomposition of historical behavior may involve a lot of post-fact knowledge that has no predictive power, so be careful interpreting cross-use results.

- What are the ranges and distributions of the input variables? What are their boundary conditions? This is not always applicable to all variables, especially unstructured data, but when it is applicable, it is usually straightforward.

- What are the ranges of the model outputs? Do they have any boundary conditions? Do boundary effects impact the situations in which the model can be applied? For example, when working with a binned variable, often bins 0 and n include less homogenous data than the rest of the bins.

- How is model accuracy calculated? This is usually far less objective than model constructors are willing to admit to. How does your error vary along different dimensions? Which dimensional error do you look at more closely, and why?

In my experience, the best way to reason about models is to work with a lot of different models, and to be honest about their flaws. That learning generally happens more efficiently, and more broadly, in the real world than in the classroom. With the recent rise in popularity of applied models, we have lots of inexperience modelers out there, so these growing pains are to be expected.

1 comments

It seems like you're talking about human-created models (e.g. a manually-constructed decision tree), whereas the paper's primary concern is machine-created models (e.g. the weights derived mechanically by a neural network).

So the problem would be: how do you get the "modelers" (i.e. the machines) to be "honest about the flaws" of the models that they generated?