Hacker News new | ask | show | jobs
by gkmcd 1948 days ago
A key part of this legislation is the requirements for tech companies to provide selected news organisations with advance notice about changes to ranking algorithms. This has been generally overlooked in the reporting and discussion but I believe it is the actually the most important part of the legislation. It will give the selected news organisations an enormous advantage over other companies not included and protect them from new competitors, basically entrenching the current media landscape for the foreseeable future.

Given the current Australian government's cosy relationship with a particular media company that currently dominates the media landscape here, I don't think it is coincidence.

7 comments

It's even broader than that. The law applies to any "alterations to the ways in which a service distributes content". The law never actually defines what this means, but it gives a bunch of examples that go beyond ranking. For example, anything that affects a particular "class of content", such as deciding whether or not to make all videos auto-play, is an alteration.

Basically, this law would prevent Facebook from deploying just about any non-trivial change to its product without first doing a detailed analysis of how it would affect the Australian news business, in order to determine whether a notification is required.

See sections 52D and 52W of the bill: https://parlinfo.aph.gov.au/parlInfo/download/legislation/bi...

Thanks for highlighting this. The initial reporting and FB’s own blog posts do not make this clear. Even running experiments at scale could be problematic with the way this is written (interpretation as a non-lawyer).
So it's just like GDPR?

Guidelines intentionally kept vague so that some bureaucrat can slap a huge fine and collect the rent?

I wonder if that rent-seeking attitude will accelerate or curb the current brain drain Australia faces.

Google has been busy making major deals with news organisations. Facebook has chosen the opposite track. The problem Facebook has is government in Australia is wildly popular because of the pandemic. They can do what they want right now it won't lose them any votes. Ministers are already telling Facebook to leave the country completely!
I would not call the federal government “wildly popular”. Most people are (rightly, IMO) laying our pandemic success at the feet of state politicians and realising that the federal level had been ineffective at best.
Fellow Aussie here - this is simply incorrect.

Australia does have competitive federalism, and so many of the localised decisions have been from the states, but the major decisions for seeding - closing borders, acquiring vaccines, etc, along with fiscal backstopping - are federal.

The federal government (under ScoMo) has never polled so well as it has during covid, and for good reason.

(NB: I am no blind Coalition supporter, but they have made decisions which are very popular, and no amount of directing attention to the states would absolve them from blame if we had a situation more like Europe or the US.)

But Australia is an island. It's much easier to be effective against covid.

Not really sure why the politicians get the slack. Which measurement was highly effective that could also be done in a country with neighbors.

Ps. I'm not pro Facebook. Just curious

Facebook and Google have not taken opposite tracks; Google hasn't made deals to pay for linking, a major problem with the legislation. Google's deals are around news content, but FB lacks comparable products to negotiate deals around, leaving them with only a refusal to pay for linking.
Yup, and given that we have largely managed the pandemic extremely well as a society with a competent government that has a publicly funded health care system, I think it's unlikely that Facebook preventing us from reading the news via Facebook is much of a concern.

We were a large consumer of news before Facebook. We will be a large consumer of news after Facebook.

That is nothing like the GDPR.
That’s not a fair characterisation of GDPR, nor one borne about by examples.

And 2020 put pause to any Australian brain drain and given how well we’ve handled the pandemic, is likely to be seen significant increases in net migration.

I actually suspect that new news carousel product they launched might be related. If they push news articles onto the carousel, they can apply a different algorithm to the carousel.

Honestly, I think that makes sense and it doesn't immediately strike me as a negative for either side. These are articles coming from trusted sources. There's no need to apply the anti-spam parts of the algorithm. News agencies get a more stable algorithm, Google gets to keep their secret sauce.

There is still an advantage to the incumbents. Those carousels are usually in prime real estate. Google would hold the keys to who is in the carousel though, so they could expand it without legislative changes. I like the flexibility, though I don't love handing Google the keys to more kingdoms.

The only problem I have with this requirement is that it requires giving only news organizations access to this information. Your concerns can adequately be addressed by ensuring everyone has even access to that information.

Google and Facebook's algorithms should be required to be publicly disclosed. As a society, we should demand that we are able to see the algorithms that every web property lives and dies based on, that lives are built and destroyed by.

These algorithms are not human readable code. They are massively complex interconnected systems of many black box ML models. I don't understand what clarity people think releasing the "algorithms" will bring. In fact, describing ranking as a single algorithm is pretty misleading.
As you say, explaining the intracicies of the algorithm is a fools errand. I guess it is more reasonable if you turn it around: these changes have drastic impact on businesses, so there is a duty to behave responsibly in administering them.

If Google really has no idea what the impact of a change will be then it is fairly irresponsible to make that change given the real world harm it can cause. But I suspect in general it does have at least a reasonable idea what the effect of changes will be - that is why it is making them.

So the more reasonable version of this is that they need to submit human interpretable descriptions of the effect of changes based on reasonable evidence and validation of their models.

Monitoring search engine and social network ranking and filtering updates should be more efficient than complaining about biased parrots (language models). This is a tip to certain ethics researchers who are raising scandals about search bias, but not in the right place - go in the field, check the fucking feeds, leave your abstract ethical tower and measure the reality.
I'm sorry, but this post sounds pretty abstract itself. What exactly do you propose they should do?
Instead of trying to argue the gender bias in "doctor - man + woman == nurse" (abstract ethical argument) they should check the search results for bias (concrete, measured effect).
In many cases they (Google) don't know the impact of changes until they try deploying the changes, and there's ML in the picture, not just algorithms. As I understand it, they often run tests that expose the change to a limited subset of users first.
Yes, but they don't just do random stuff. They make changes with the intention to adjust the experience in certain ways, so making those intentions public is important.
If it’s ML that is doing all the work to display articles. Then ML has a long way to go.
No, it's ML tasked with "user engagement" doing the work. Not ML in general.
I also believe any algorithm that isn't human-readable should be banned. If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

The fact that technology companies have been grossly negligent and irresponsible isn't a reason to not regulate them: It's proof regulation needs to be much, much stronger.

This is an incredibly naive perspective. I guess you want to ban search engines, self driving cars, automated filtering of lewd and abusive content (why do you think FB isn’t full of porn? It’s not a hand engineered algorithm), automatic speech recognition for the hearing impaired, and a vast swath of important technology I didn’t list. I don’t think you really understand the implications of what you’re asking for. Sorry - black boxes are here to stay. And they are immeasurably useful. I could spend hours listing important and crucial technologies that you want banned because you are scared of racism.
Search engines already worked before ML, neither automated filtering nor self-driving cars actually work in reality.
I agree with you but I am still scared of racism.
> I agree with you but I am still scared of racism.

My suspicion is that the concern with machine learning over racism is rooted in two things. The first is just the general modern trend of accusing anything you don't like of being racist, because everybody hates racism and wants to fight it. And the second is the fear on the part of people who make a living fighting racism that machine learning might actually put them out of a job.

Because machine learning is basically a paperclip optimizer. You tell it to maximize a thing, it maximizes the thing and minimizes everything else. Racism isn't paperclips, so the paperclip optimizer will optimize for smashing it in favor of making more paperclips. And then they're out of business.

Because when you look at the criticism of this stuff, it generally looks like this. ~12% of the population is black, only ~5% of the selected applicants are black, the algorithm is accused of racism.

But nothing is that simple, because all kinds of things like income and education level and so on correlate with race, so you have to take all of those things into account before you can tell what's going on. And taking into account all of the available data is how machine learning works.

Which isn't to say that you couldn't make an algorithm racist. Tell it to optimize for applicants with a particular skin color and it does. But then your problem isn't with the algorithm, it's with the jackasses who asked for that.

What to optimize for is a much more general and difficult question. (Hint: Not paperclips.)

I absolutely want to ban self-driving cars that behave in ways no human can explain or understand! The mere idea that anyone would think that should be legal is borderline insane.

All you are doing here is convincing me that tech companies are just runaway trains with nobody at the controls!

> I absolutely want to ban self-driving cars that behave in ways no human can explain or understand!

Can you explain or understand the algorithms humans use to drive cars?

What about all the other examples he listed. What about cancer detection? Or viral spread prediction? Drug discovery or medical imaging diagnosis? Physics research?

Machine learning is very widely used in the sciences and extremely beneficial to humanity in uncountably many ways and assuredly countless more to come. Of course technologies can be used for evil but so can nearly everything that exists. I believe your proposal comes from a desire to help or better the world, but to ban all non-human-readable algorithms is frankly ridiculous and demonstrates a naive understanding of the issue. It sounds a lot like the calls by the U.S. Congress to ban encryption.

Continue this line of thinking, would you want all algorithms banned? Might as well shut everything down :shrug:
We can't even explain all physical phenomena, so good luck with banning anything that depends on the gravity of earth to function, because we don't know what gravity is.
> I also believe any algorithm that isn't human-readable should be banned. If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

I'm not sure a human-readable algorithm exists for ranking all the web pages in the world based on natural language input. In fact, I'm pretty sure such an algorithm does not, and potentially cannot, exist given the absolute failure of all approaches towards NLP that weren't based on absolute masses of text data and complex models.

Are you willing to make Google 10% as effective to achieve your goal of a human-readable algorithm?

you don't need any NLP to rank webpages (in fact the entire innovation of Google was that they figured out a way to rank pages completely ignoring that fact). Pagerank works fundamentally by treating the web as a graph and prioritising results based on their connections, that is to say it ranks based on popularity and is agnostic about the content of the actual page.

This generally has worked well. On the other hand, actually attempting to manipulate search results based on automated handling of content is what has given us countless of censorship debates or simply failure where even uncontroversial content is removed or downranked because it violated some sort of strange rule because it had a 'bad word' in it. On Facebook recently clothing ads for the disabled people were banned[1], because turns out the ML system only cared about the wheelchair, not the person in it.

It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.

[1]https://www.nytimes.com/2021/02/11/style/disabled-fashion-fa...

You've just skipped over the early days of Google where they relied primarily on PageRank and bad actors manipulated it to death.

It's trivial to generate webs of fake, inter-related content and use that specifically to feed incoming links to valuable pages. Or to comment-spam websites so aggressively it ruins them. Or all of the secret deals between high-ranking sites to feed links even though the sites weren't related. There are countless examples of black-hat techniques to break PageRank.

I am sorry but you simply can't build a sustainable search engine without deeply understanding the user intent and the meaning behind the indexed pages.

Pagerank worked fine when it was invented. It's a very elegant algorithm. But in a perfect illustration of Goodhart's law, it fell apart once people realized that they could game it to increase their traffic. Google has been in a constant arms race against unscrupulous SEO practices ever since.
what is the weather today, Google?

I agree that you don't need NLP to rank webpages (though it certainly helps), but you do need it to parse the kinds of queries given to search engines these days. The days of logical OR and NOT are long gone I'm afraid.

> It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.

I think other commenters have addressed the PageRank issue, but I'd be super interested in papers doing the work you note above.

> Are you willing to make Google 10% as effective to achieve your goal of a human-readable algorithm?

Absolutely. If it can't be done responsibly and ethically, perhaps it should not be done.

what % of people do you think would be willing to stop using search engines because they are unethical?
If you look at the actual data, you will find that black box models are in fact responsible for preventing the majority of abusive content including hate speech and porn on social media platforms. Ban these models and you’d find your favorite social media platform is more abusive. Most of the racism and sexism you are concerned about comes from other humans.
Do you apply the same standard to people?

Tell me, how did your brain come up with what you wrote? How do I validate that it isn't racist, sexist, or slanted towards encouraging violence and harm?

By asking them. You can't just ask an algorithm, it must be designed to show its own work. Credibility is another problem...
Why can’t you just test the algorithm? It’s not conclusive, but it’s also not worthless.
lol. sorry, but that reminds me of a skit by an Australian comedian:

male guest: "now first of all, let me just start by saying I'm not racist..."

female guest: "pfft..."

host: "ah see you made a noise there, but a lot of people accuse him of being a racist, so I think it's very helpful to know that he actually isn't one..."

Very few people have the ability to influence the success or failure of every business on the planet. Those that do are heavily scrutinized for racist or sexist behavior. (Sometimes they also don't get convicted anyways, but that's another matter.)
> Very few people have the ability to influence the success or failure of every business on the planet.

In other words the solution to this should be antitrust enforcement and decentralization of power.

> any algorithm that isn't human-readable should be banned

There's existing a term for people with this view:

https://en.wikipedia.org/wiki/Luddite

You refer to the activists who successfully protected their quality of life by refusing to let someone else use technology to ruin it.

An apt comparison.

I'm sorry I have to tell you this, but they were not successful.
>If it can't be understood, nobody can validate that it isn't racist, sexist, or slanted towards encouraging violence and harm.

This is quite a bizarre claim as there is famously an entire category of problems that are hard to solve but easy to verify: P vs NP

Yeah, they can give you the architecture drawn as a nice mind map, list the hyper-parameters, but that's like knowing the algorithm of the compiler, it doesn't help detect a bad program. The question is what the model is learning, not how. What are the inputs and what is it learning to output.
Explainable models do not preclude the systemic problems you highlight. Plenty of systems before the advent of non-explanatory ML models had those defects. One option is to define test and validation sets and encourage 3P validation, somewhat like how accreditation works in other contexts.
Publicly disclosing the algorithms would drastically increase the pace of gaming them and resulting in pay to play system where the fanciest SEO wins.

Google and Facebook partially relies on the obscurity to keep the fighting the spam battle. IMO we don't have the technology yet to have fully open ranking algorithms that are not quickly broken.

To think of it - similar to crypto around WW2.

This isn't as true as it once was.

Google's best asset for ranking is their user data. Even if you had the exact algorithm, you couldn't game it without massive amounts of user traffic. (At least not for popular searches.)

No. Their best asset is their deep understanding of what makes a page "good" and the intent behind a search query.

You could get rid of all their user data and it would still be a great search engine.

delegating the war against spam rather then being picked up by the user doesn't seem right. To give Big Tech such power to relieve ourselves of a mild annoyance is destructive. This is understood in other aspects of life Hence we have local governments which are inefficient and inconvenience people greatly. Yet it is found that selling all our problems is counterproductive. It ends with monopolies. The answer to this isn't to charge tech companies for the privilege of dictating our lives, rather, it's greater accountability on behalf of big tech and more responsibility on the part of consumers. The only cure for google domination is for the transfer of information online to become more democratised.
This excuse has been used to protect Google and Facebook for decades, but considering disinformation campaigns, civil unrest, and outright genocide has been the cost... I think the price of using obscurity to prevent SEO tactics is way too high.
The root cause isn't algorithms, it's a lack of accountability (both of companies and of users). The problem with 8chan wasn't some inadvertently harmful AI, it's that the site and its users damaged the world for several years without facing consequences.
I'm curious how this advance update thing is supposed to work. What does disclosing those details look like, actually?

The reason I'm asking is that as these things grow in complexity, it's quite possible that even if you join the team that works on these systems it will probably take you a pretty long time to understand how they really work. Their actual behaviour is likely to still be mysterious a lot of the time because they're driven by data.

Is a high-level description in english OK? Do we need to see pseudocode? The source code code? Do they have to open source it? What parts, if it's tied to internal frameworks? If there is ML, do they have to disclose all their sauce there? The trained network / weights? The training data, if the alg alone is useless without a data set?

Any human-initiated change to search algorithms is presumably human-understandable. Someone writes a rule to downrank some terms or traits of a website, they presumably document it somewhere.

That documentation will need to be shared, and the implementation of the rule change will need to be delayed until the disclosure window has passed.

Human understandable, yes, but the details of particular changes might only make sense to humans familiar with the system.

But yeah, the product manager view / documentation of intent sounds generally reasonable.

I do wonder how useful that would be to the news orgs in practice.

Honestly, first and foremost, I expect a firehose of documentation, if Google isn't lying about making dozens of changes to it's algorithms every day. News companies might need a full-time guy (or team) just to sit there and read through them all.

But on the other hand, a bunch of journalists will have a ton of never-before-seen information about how the world's most powerful companies affect every other company on the planet. That alone is going to be worth some major exclusives.

Also, by the mere nature of being forced to share it, Google and Facebook will have to clean up their acts, they'll have to assume any change they make that could open them up to legal scrutiny will be found.

You underestimate the complexity here by orders of magnitudes. You also overestimate the usefulness to news companies. You underestimate the harm that bad actors can take.

The search algorithm tells you the order of search results for a particular set of terms. Except that as input you need to feed it a graph of the entire indexed internet, which is re-indexed periodically as the content on the index changes. How does knowing that benefit new companies? What, exactly would your hypothetical full-time guy/team, equipped with that index at huge cost, tell their company that would justify the time and expense? That they should write interesting content that lots of people consume?

Second, the general approach has been published and is well documented [1], as are its susceptibilities to attack [2]. So there's your algorithm, what does it tell you?

Third, general SEO isn't the problem, it's coordinated attacks that can poison all search results / ads markets if enough detail is known. Google invests [3] heavily to address these areas [4].

Finally, you underestimate how much of a firehose you'd have to drink from. It describes all of the internet.

[1] http://infolab.stanford.edu/~backrub/google.html

[2] https://en.wikipedia.org/wiki/PageRank#Manipulating_PageRank

[3] https://www.quora.com/What-does-the-Counter-Abuse-Technology...

[4] https://www.blog.google/around-the-globe/google-europe/meet-...

I mean, if not tied to particular news organisations, this could actually be a sensible requirement.
No, I've lived in the SEO world my entire life and I guarantee you that if the ranking changes were published spam would be so bad is to make search engines or social media totally unusable.
Then publish the broad criteria behind the ranking and have the technical details be audited by the government behind closed doors.
State-mandated news ranking controlled by the ruling party in secret! I'm sure nothing can go wrong with that.
While I share your hesitation here, I think two points are important to keep in mind:

1. There is quite a difference between compulsory auditing (what the post you reply to refers to) and the government directly controlling industry.

2. In other industries this is quite commonplace and hasn't led to government takeover of industries (banking comes to mind. In their regulatory implementation on the Basel III accords developed in response to the 2008 financial crisis, both the UK and EU mandate government audits to ensure compliance with stress-testing and and leverage requirements; the US is also a signatory to these accords, but I am less familiar with their implementation into US law).

I'm not personally a huge fan of this approach, but I don't find the argument that government oversight is a slippery slope to totalitarianism that persuasive. In my opinion, a much a stronger critique of mandatory government audits is that they are often not that effective at preventing the negative outcomes they set out to prevent but still massively increase the legal complexity of operating in (or entering) a given industry without falling afoul of the law.

Nice try, Facebook.
I wonder why Josh Frysenberg didn’t mention this at his presser today. Contrast this to him repeatedly saying “digital giants must pay traditional news”... I was waiting for him to slip once and say “tradition news must pay my boss Murdoch”
> a particular media company that currently dominates the media landscape here

What is this company, out of curiosity? My guess is ABC, but I don't know.

No, it's newscorp. They have a 55%ish(?) readership share for print media in Australia and are very cosy with the incumbent government. Afaik it's one of the most concentrated markets in the world. (Behind Egypt and China I believe).

I'm not sure the exact online share.

It's only a cosy relationship with the incumbent government if said government does what it's told.

Kevin Rudd and Malcolm Turnbull are examples of what happens when you try and dictate terms with NewCorp and they turn on you with negative press.

https://www.theguardian.com/media/2020/nov/18/kevin-rudd-and...

As others have started, when looking at events in Australia, you can consider the currently-governing Liberal party (economically liberal, socially conservative) and News Corp as effectively the same organisation.

On the other hand, the Liberal party is very hostile to the public service in general and the ABC in particular.

They should be hostile to the ABC, because their journalistic standards are terrible. I've reported 3 factual inaccuracies to them and all of them took more than a month to retract. In one case, the inaccuracy affected the entire premise of the article.

ABC's standards are that it's okay to lie as long as you retract it a month later in a tiny 10pt foot note.

On the other hand, I've personally reported a similar article inaccuracy to a News Corp writer and he replied in 10 minutes, issuing a retraction.

Similarly, I reported an article inaccuracy in a Fairfax website and they retracted in less than 2 days. No reply but as long as it's corrected I don't mind.

SBS is even worse, they actually have zero accountability for online operations.

You clearly don’t watch media watch.

What are these supposed inaccuracies??

I've had retractions printed in nearly every Australian news website, and among the 3 times I've contacted the ABC about inaccurate reporting the average response time is 1.5 months. 100% of my reports resulted in retractions (extremely delayed ones).

This is not an organisation that cares about journalistic integrity. In fact they actively eschew ethics while their private sector counterparts reply in 1/50th the time or less.

Also, you're using an ABC-produced show as evidence that the ABC isn't ethically compromised? "We investigated ourselves and found we we did nothing wrong"?

Before you accuse me of being a shill, remember I've had retractions printed in News Corp outlets, too.

Murdoch’s News Corp.