Hacker News new | ask | show | jobs
by yonkshi 2720 days ago
..could you elaborate? Are you saying ML is dangerous because it’s based on mathematics, and math is too clean to handle real world data?
3 comments

I think it's just that calling it "maths" tends to give a false sense of certainty where none is warranted.

Example: All the really clever math you use to make an encryption algorithm is all 100% correct. Then all the really clever math you use to show that it would take the heat death of the universe to crack your clever encryption is 100% correct. The user uses 'password' as the key; How does your crypto stand up to a brute force? Is that your algorithms fault? Did your difficulty proof lie to you?

I know key length is a well understood. In terms of how algorithmically "valid" real world data that can otherwise torpedo entire complex systems, it's as good an example as any.

Machine learning is literally mathematics, or more specifically, applied statistics. However, human stupidity can never be ruled out of the equation. Not calling something mathematics while it simply is mathematics is obfuscating the issue.
I think OP means that idealizing ML experiments using contrived data can distort the picture. Real world, ecologically valid results can only be discovered as they emerge when the algorithms are deployed in production. ML algorithms sometimes cook up solutions that can surprise or even disturb their creators.

I'm not sure I exactly agree with this premise. If you read about the principles of chaos engineering, (https://principlesofchaos.org/) it's possible to simulate real world events in testing. And if there's a rigorous mathematical backbone to ML as there clearly is, some determinations about its limitations should be universal for all cases, even if the emergent results in production are unpredictable and could range over intractably many possible outcomes.

I'd elaborate by saying that for the focus of ML to be on mathematics risks creating technology that isn't useful at great cost. I think that software engineering followed a similar arc; the mathematical appeal and authority of formal methods was very great, in the absence of wide and deep experience of the reality of system development there was a huge focus on trying to use formal methods as a fundamental part of system development with very marginal pay off in terms of current practice. I think that the impact of this can be seen in the way that software development is essentially an artisanal practice now. AI and ML risk the same sort of diversionary excursion where mathematical fundamentals are the focus of the field and the real world is demoted/externalised. We do not understand the transformations between data and a deployed application, they are semantic, organisational, system dependent and very human. We can't characterise the type of mistakes that classifiers will make, or why those mistakes are, or aren't significant in an application. We can't engineer a machine learning system in the sense that we can't evaluate or certify that it'll work reliably or consistently.

Right now ML and AI are like airships in the 1920's if and when something goes wrong and lots of people die (or are blinded) the community isn't even in a position to properly investigate what's happened. Before we get to focusing on the equivalent of hydrodynamics we need to move to an organisation and practice of engineering discipline - that's what the aircraft people did, and that's why the windows in jetliners aren't square, and that's why you can fly off on holiday.

If AI and ML don't do this and instead everyone spends their hours and days doing maths that isn't absolutely at the core of the real issues of application then watch as confidence and trust evaporates and be ready to wait 20 years to see any value arise.

But - maths that achieves results like those in compiler design and optimisation, I'll buy that for $1!

"We can't characterise the type of mistakes that classifiers will make, or why those mistakes are, or aren't significant in an application. We can't engineer a machine learning system in the sense that we can't evaluate or certify that it'll work reliably or consistently."

These statements are false. It sounds like your extrapolating what you've read in a few blog posts and assuming that's how the entire industry operates. You don't read headlines about people digging through the data and error logs on a daily basis b/c it's not headline worthy but that doesn't mean it's not being done.