Hacker News new | ask | show | jobs
by hntrader 1951 days ago
We must disambiguate between political objectives and the practice of credit risk modelling.

Credit risk, f(X), is an unknown population function that needs to be estimated using observed data X.

If including tallness into X improves our estimate of f(X), then we've gotten a better model.

You've asserted that X should only contain an individual's past actions instead of their inherent traits such as tallness. This may satisfy certain political objectives, and that's fine if we're being upfront about the underlying motivation, but from an ML perspective your prescription doesn't make much sense unless you have some prior knowledge about the function f(X) that tells you that tallness both isn't relevant and isn't acting as an instrumental variable for some other missing feature.

Unless you have such domain knowledge, you've little business asserting what are appropriate features to use in order to improve model quality.

4 comments

Credit risk modelling is a Political Objective. They can't be separated. The only place they are separated is in the figment of someones imagination. The use and investigation of these models does not happen in a vacuum. The real question is, does better modeling help society? And that is through and through a political question.
I don't agree that this has anything to do with political objectives. It's a question of ethics. My domain knowledge is irrelevant, the ML Modelling aspect is irrelevant. The discussion was specifically around whether it's ok to include inherent traits when determining the credit worthiness of an individual.

If you think it is, that's fine. We might as well just taking the same approach to crime, and start locking individuals up or not extending job offers, NOT because they've done a single thing wrong, but simply because they're statistically more likely to.

> We might as well just taking the same approach to crime, and start locking individuals up or not extending job offers,

The main difference is that people have right to trial and to be considered innocent until proven guilty. But there is no 'right to credit'. Credit is fundamentally two-party contract.

Also there is shared limit to risk, forcing creditors to take more risk with some people means they may not take that risk in other cases (not giving credit to someone who would be marked lower risk with more informed decision) or forcing them to raise credit cost to everybody.

Making the statistical model for E(claims) worse on purpose by excluding relevant features (e.g. inherent traits) is political. The insurers have no choice since the ethical views of the majority are hoisted upon them through politics. The causal path has its roots in an ethical conversation that has played out in public, but this has mediated itself through politics/legislation.

The crime analogy is inappropriate. Insurance is a private voluntary arrangement between two consenting entities. Convictions on the other hand are an involuntary imposition on an unwilling party.

One could make the argument that allowing inherent traits in the pricing of insurance is the less authoritarian and more utilitarian option, since it is less forceful state interference in private business and leads to more accurate claims pricing and less subsidisation of insurance for person A by person B. Your prison sentencing analogy on the other hand implies more force, which is why I don't view it as a valid analogy.

The ethical argument can go either way depending on the axioms we pick a priori. If we pick Libertarian deontological axioms, then the ethical choice is to allow inherent traits into the model. If we pick racial equity deontological axioms, then we get another conclusion.

> It's a question of ethics.

Which is an important thing that isn’t math.

I also think that it is possible that the model learned that information from too small of a data sample. What is a good data sample for every such feature in a relatively balanced manner is really difficult to build a dataset from.

Consider a sample size of 10/1million with height value of 7m. And somehow 7/10 had poor ability to repay loans. With such a small sample size of this relevant factor be a good thing to rely on?

Unless you have causal proof, its irresponsible for a business to use such factors in modeling outcomes.