Hacker News new | ask | show | jobs
by nyrikki 624 days ago
Perhaps looking into the issues with uncongeniality and multiple imputation may help, although I haven't looked at MI for a a long time so consider my reply as an attempt to be helpful vs authoritive.

In another related intuition for a probable foot gun relates to learning linearly inseparable functions like XOR which requires MLPs.

A single missing value in an XOR situation is far more challenging than participant dropouts causing missing data.

Specifically the problem is counterintuitively non-convex, with multiple possibilities for convergence without information in the corpus to know which may be true.

That is a useful lens in my mind, where I think of the manifold being pushed down in opposite sectors as the kernel trick.

Another potential lens to think about it is that in medical studies the assumption is that there is a smooth and continuous function, while in learning, we are trying to find a smooth continuous function with minimal loss.

We can't assume that the function we need to learn is smooth, but autograd specifically limits what is learnable and simplicity bias, especially with feed forward networks is an additional concern.

One thing that is common for people to conflate is the fact that a differentiable function is probably smooth and continuous.

But the set of continuous functions that is differentiable _anywhere_ is a meger set.

Like anything in math and logic, the assumptions you can make will influence what methods work.

As ML is existential quantification, and because it is insanely good at finding efficient glitches in the matrix, within the limits of my admittedly limited knowledge, MI would need to be a very targeted solution with a lot of care to avoid set shattering from causing uncongeniality, especially in the unsupervised context.

Hopefully someone else can provide a better productive insights.

1 comments

Honestly, I think that we're coming at this from very different perspectives.

Single imputation is garbage for accurate inference, as it reduces variance and thus confidence intervals as P(missing) increases.

MI is a useful method for alleviating this bias (though at the cost of a lot more compute).

That's why it gets used, and it's performed extremely well in real world analyses for basically my entire life (and I'm middle-aged now).

> especially in the unsupervised context.

I wouldn't use MI in an unsupervised context (but maybe some people do).