Hacker News new | ask | show | jobs
by ocdtrekkie 1948 days ago
I also believe any algorithm that isn't human-readable should be banned. If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

The fact that technology companies have been grossly negligent and irresponsible isn't a reason to not regulate them: It's proof regulation needs to be much, much stronger.

8 comments

This is an incredibly naive perspective. I guess you want to ban search engines, self driving cars, automated filtering of lewd and abusive content (why do you think FB isn’t full of porn? It’s not a hand engineered algorithm), automatic speech recognition for the hearing impaired, and a vast swath of important technology I didn’t list. I don’t think you really understand the implications of what you’re asking for. Sorry - black boxes are here to stay. And they are immeasurably useful. I could spend hours listing important and crucial technologies that you want banned because you are scared of racism.
Search engines already worked before ML, neither automated filtering nor self-driving cars actually work in reality.
I agree with you but I am still scared of racism.
> I agree with you but I am still scared of racism.

My suspicion is that the concern with machine learning over racism is rooted in two things. The first is just the general modern trend of accusing anything you don't like of being racist, because everybody hates racism and wants to fight it. And the second is the fear on the part of people who make a living fighting racism that machine learning might actually put them out of a job.

Because machine learning is basically a paperclip optimizer. You tell it to maximize a thing, it maximizes the thing and minimizes everything else. Racism isn't paperclips, so the paperclip optimizer will optimize for smashing it in favor of making more paperclips. And then they're out of business.

Because when you look at the criticism of this stuff, it generally looks like this. ~12% of the population is black, only ~5% of the selected applicants are black, the algorithm is accused of racism.

But nothing is that simple, because all kinds of things like income and education level and so on correlate with race, so you have to take all of those things into account before you can tell what's going on. And taking into account all of the available data is how machine learning works.

Which isn't to say that you couldn't make an algorithm racist. Tell it to optimize for applicants with a particular skin color and it does. But then your problem isn't with the algorithm, it's with the jackasses who asked for that.

What to optimize for is a much more general and difficult question. (Hint: Not paperclips.)

> My suspicion is that the concern with machine learning over racism is rooted in two things. The first is just the general modern trend of accusing anything you don't like of being racist, because everybody hates racism and wants to fight it. And the second is the fear on the part of people who make a living fighting racism that machine learning might actually put them out of a job.

I don't get to how you go from this statement, to then again explaining exactly how racism is embedded in algorithms. By using the biased data we have in the real world...

It isn't the data that's biased. If you're hiring a computer scientist and disproportionately few black people have a degree in computer science, the data is not lying about who the qualified applicants are and the algorithm can't change that.

To fix that you have to cause more black high school students to go to college and study computer science and then wait two generations until their proportionality in the installed base of qualified computer scientists reaches parity. There is no magic wand that makes it happen overnight.

But concentrating on the places where it can't be solved instead of the places where it can will make it take even longer.

No, the racism is a real issue, though a lot of it is caused by limited training data. Having an image recognition algorithm identify Africans and South Asians as gorillas doesn't happen because the designers intended it, but because their training data had only light-skinned human faces and dark-skinned primates. But the effect is racist even though this wasn't the intent.

Likewise, if the system is trained to duplicate human decision-making (like who gets loans), interesting things can happen: if the decision-makers unconsciously favored whites over blacks, the algorithm could wind up weighing skin color or stereotypically Black or Latino names negatively, meaning that the final model is explicitly racist, just because there is a correlation in the training data. That doesn't mean we shouldn't use deep learning, it means that it's not responsible to just fit the training data and ship without testing for such problems.

> Having an image recognition algorithm identify Africans and South Asians as gorillas doesn't happen because the designers intended it, but because their training data had only light-skinned human faces and dark-skinned primates. But the effect is racist even though this wasn't the intent.

This isn't racism at all. It's just bad PR because humans take the implication that calling black people monkeys is calling them stupid, since that's the implication you would draw if a person did that.

An algorithm doing that is just recognizing that humans and gorillas are both primates:

http://www.aquilaarts.com/bushmonkey.html

And then it's a bug, in the same way that recognizing a black balloon as a balloon but a white balloon as a light bulb is a bug. It has nothing to do with race at all. The algorithm isn't racist against white balloons. The solution is a general increase in the amount of training data, which is what you want in all cases regardless.

> if the decision-makers unconsciously favored whites over blacks, the algorithm could wind up weighing skin color or stereotypically Black or Latino names negatively, meaning that the final model is explicitly racist, just because there is a correlation in the training data.

Except that this is exactly the thing that a paperclip optimizer will smash to bits because it interferes with the goal of making more paperclips.

I’m not an expert in this, but I think racists call black people apes, not just because they think they are stupid, but because they think they are sub-human.

Blacks don’t reach the intelligence and blah to be human. I think that’s what racists drive at when they call someone a monkey, and that’s why it’s so offensive.

It would also make your theoretical AI racist, as it identified blacks as not human.

Honestly, at the end of the day that is what is so difficult about much of this. It’s mostly subjective

I absolutely want to ban self-driving cars that behave in ways no human can explain or understand! The mere idea that anyone would think that should be legal is borderline insane.

All you are doing here is convincing me that tech companies are just runaway trains with nobody at the controls!

> I absolutely want to ban self-driving cars that behave in ways no human can explain or understand!

Can you explain or understand the algorithms humans use to drive cars?

Screw that.

Explain to me step by step how you walk.

Humans are held responsible if they cause harm to others. If a driver hits a pedestrian on purpose he is charged with murder. Who do you charge if a self-driving car behaves in this way?
Who do you charge when the brakes don't work on your car? When the airbags don't activate? When your rain sensor doesn't work?

Believe it or not, your car is not that primitive when compared to a self-driving one in terms of the number of things it does autonomously.

Isn't it the case that car manufacturers will have to issue a recall on defective models ?
the humans that designed the car? to be clear, computers don't intentionally do anything. if an engineer deliberately programs a car to hit pedestrians for no reason, they would be charged with murder. if the car hits a pedestrian as a result of an engineering mistake, the company would be liable for damages, and if particularly egregious, engineers might face manslaughter charges.
To be clear, that was my point. You can't punish the computer and good luck finding the one to punish for an accident ten years later. But maybe if it's free software...
Well, you blame the black box algorithm that nobody can predict or understand, and you just call it an "accident".
Who would you blame if the algorithm could be printed on paper?
only if caught
What about all the other examples he listed. What about cancer detection? Or viral spread prediction? Drug discovery or medical imaging diagnosis? Physics research?

Machine learning is very widely used in the sciences and extremely beneficial to humanity in uncountably many ways and assuredly countless more to come. Of course technologies can be used for evil but so can nearly everything that exists. I believe your proposal comes from a desire to help or better the world, but to ban all non-human-readable algorithms is frankly ridiculous and demonstrates a naive understanding of the issue. It sounds a lot like the calls by the U.S. Congress to ban encryption.

Here is what I think:

- In medical: your doctor should be responsible for your diagnosis and drug company is responsible for defective drugs, except when they get away with lobbying and hiring good lawyers.

- In physics: I'm not sure if it's as big of a problem as in social networks. But consider this case: If you cannot reproduce the result of an experiment due to a ML model being cryptic, that would lead to huge credibility issue in science.

At best, you may be able to justify black boxes providing secondary indicators: Maybe using AI to study cancer detection might lead you to a new solid discovery, but "we use AI to determine if you have cancer" should never be the mission, as it fails to generate useful information about how it is detected.
> fails to generate useful information about how it is detected

Patients don’t care how cancer is detected. Patients care if the diagnosis is correct.

Continue this line of thinking, would you want all algorithms banned? Might as well shut everything down :shrug:
We can't even explain all physical phenomena, so good luck with banning anything that depends on the gravity of earth to function, because we don't know what gravity is.
But gravitational laws stay unchanged for millenials isn't it ? If I toss an apple, it will falls down. If I throw it fast enough, it goes into orbital mode.
> I also believe any algorithm that isn't human-readable should be banned. If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

I'm not sure a human-readable algorithm exists for ranking all the web pages in the world based on natural language input. In fact, I'm pretty sure such an algorithm does not, and potentially cannot, exist given the absolute failure of all approaches towards NLP that weren't based on absolute masses of text data and complex models.

Are you willing to make Google 10% as effective to achieve your goal of a human-readable algorithm?

you don't need any NLP to rank webpages (in fact the entire innovation of Google was that they figured out a way to rank pages completely ignoring that fact). Pagerank works fundamentally by treating the web as a graph and prioritising results based on their connections, that is to say it ranks based on popularity and is agnostic about the content of the actual page.

This generally has worked well. On the other hand, actually attempting to manipulate search results based on automated handling of content is what has given us countless of censorship debates or simply failure where even uncontroversial content is removed or downranked because it violated some sort of strange rule because it had a 'bad word' in it. On Facebook recently clothing ads for the disabled people were banned[1], because turns out the ML system only cared about the wheelchair, not the person in it.

It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.

[1]https://www.nytimes.com/2021/02/11/style/disabled-fashion-fa...

You've just skipped over the early days of Google where they relied primarily on PageRank and bad actors manipulated it to death.

It's trivial to generate webs of fake, inter-related content and use that specifically to feed incoming links to valuable pages. Or to comment-spam websites so aggressively it ruins them. Or all of the secret deals between high-ranking sites to feed links even though the sites weren't related. There are countless examples of black-hat techniques to break PageRank.

I am sorry but you simply can't build a sustainable search engine without deeply understanding the user intent and the meaning behind the indexed pages.

>There are countless examples of black-hat techniques to break PageRank

there are also countless of adversarial examples to trick ML algorithms. In fact this is in many ways worse because of the 'idiot savant' character of ML systems, which are almost always oblivious to context and can be tricked in ways that aren't apparent from the design of the system.

In contrast to systems that are legible or even formally verifiable ML systems are entirely unable to provide any guarantees. When someone breaks pagerank at least it's apparent how they broke it. When an ML system mistakes a turtle with a fractal pattern on its shell for a gun nobody knows how to fix the system in any reliable way, other than feed it more data and pray.

Pagerank worked fine when it was invented. It's a very elegant algorithm. But in a perfect illustration of Goodhart's law, it fell apart once people realized that they could game it to increase their traffic. Google has been in a constant arms race against unscrupulous SEO practices ever since.
>Google has been in a constant arms race against unscrupulous SEO practices ever since.

One company controls 80% of what is found on the internet. They set rules, restrictions, penalties that are not public. They do not pass any sort of regulatory muster. They rip and tear through businesses standing in their way. They crush out a person's online existence through never explained reasons. They use every advantage they can to tweak a human's emotions, drive and needs to feed more and more advertisements.

You suggest those trying to use every advantage they can to rank higher unscrupulous?

Google's fight to keep search results crisp ended soon after they began selling advertising. Google long ago quit innovating search to be better for people, they've made it better for advertisers.

what is the weather today, Google?

I agree that you don't need NLP to rank webpages (though it certainly helps), but you do need it to parse the kinds of queries given to search engines these days. The days of logical OR and NOT are long gone I'm afraid.

> It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.

I think other commenters have addressed the PageRank issue, but I'd be super interested in papers doing the work you note above.

> Are you willing to make Google 10% as effective to achieve your goal of a human-readable algorithm?

Absolutely. If it can't be done responsibly and ethically, perhaps it should not be done.

what % of people do you think would be willing to stop using search engines because they are unethical?
To me, their response didn't seem to indicate that it should be directly decided by people. This is a consumer protection matter, and to stretch an analogy, like a list of ingredients on a consumable. Here we have these black boxes, and no list of ingredients, yet they drive and shape our world. A Person can't EVEN directly decide if they wanted to.
If you look at the actual data, you will find that black box models are in fact responsible for preventing the majority of abusive content including hate speech and porn on social media platforms. Ban these models and you’d find your favorite social media platform is more abusive. Most of the racism and sexism you are concerned about comes from other humans.
Do you apply the same standard to people?

Tell me, how did your brain come up with what you wrote? How do I validate that it isn't racist, sexist, or slanted towards encouraging violence and harm?

By asking them. You can't just ask an algorithm, it must be designed to show its own work. Credibility is another problem...
Why can’t you just test the algorithm? It’s not conclusive, but it’s also not worthless.
Seems to me that’s a viable answer. How can we test an algorithm like Google’s ranking though? We can’t feed it consistent data like in a software test. It relies on too much information, and what we know about it indicates we can’t extract it out to test against it—except for results in the real world.

Not to mention Facebook’s are even more difficult. Tangentially related, remember when you could use “View As” on your profile page to see what your profile looked like to others? It doesn’t work anymore, only works for Public and Yourself; you can no longer choose the person to view as.

It’d be great to test these algorithms. We can’t. They need to be designed and instrumented so this is possible.

lol. sorry, but that reminds me of a skit by an Australian comedian:

male guest: "now first of all, let me just start by saying I'm not racist..."

female guest: "pfft..."

host: "ah see you made a noise there, but a lot of people accuse him of being a racist, so I think it's very helpful to know that he actually isn't one..."

Right, like I said, credibility is a different problem. But at this point, we don't even get a lie from them, we get nothing. At least a lie can be checked and examined. There's nothing available at all currently.
We actually have a reasonable way to test for human biases in AI - perturb the input a bit and see how the AI responds - For e.g., change the name, change the gender etc. and we use them to measure if AI is fair. It is different question whether all AI can be subject to such tests. For e.g., how will you detect a human bias in a page ranking algorithm? but for where it matters, you can test them and we do test them.
Yes. True and fair. But how can we test the page rank algorithm “ourselves"? Who is "we" in the "we do test them"? Is the public even asking for 3rd party examinations by transparent/public organizations (or at least, publicly funded)? Seems like we only get to "test" against the live system, and third party examination seems relatively impossible. It seems like something with such far reaching and invasive results should be more accessible, at the very least.
Very few people have the ability to influence the success or failure of every business on the planet. Those that do are heavily scrutinized for racist or sexist behavior. (Sometimes they also don't get convicted anyways, but that's another matter.)
> Very few people have the ability to influence the success or failure of every business on the planet.

In other words the solution to this should be antitrust enforcement and decentralization of power.

> any algorithm that isn't human-readable should be banned

There's existing a term for people with this view:

https://en.wikipedia.org/wiki/Luddite

You refer to the activists who successfully protected their quality of life by refusing to let someone else use technology to ruin it.

An apt comparison.

I'm sorry I have to tell you this, but they were not successful.
The luddites obtained numerous concessions and retired comfortable. Not clear how that’s unsuccessful.
>If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

This is quite a bizarre claim as there is famously an entire category of problems that are hard to solve but easy to verify: P vs NP

Yeah, they can give you the architecture drawn as a nice mind map, list the hyper-parameters, but that's like knowing the algorithm of the compiler, it doesn't help detect a bad program. The question is what the model is learning, not how. What are the inputs and what is it learning to output.
Explainable models do not preclude the systemic problems you highlight. Plenty of systems before the advent of non-explanatory ML models had those defects. One option is to define test and validation sets and encourage 3P validation, somewhat like how accreditation works in other contexts.