Hacker News new | ask | show | jobs
by fwdpropaganda 3001 days ago
> Sadly "equal pay" means whatever you want it to mean.

Exactly this. At a fundamental level every individual has an infinite number of "dimensions" (in ML parlance) associated with him, and "equal pay" people or groups will try and convince you that expected pay should be the same over a certain subset of dimensions while ignoring all the others. Invariably they will pick the most beneficial to that particular person or influence group. They will give you arguments from moral, while forgetting to mention that like in any midly complex problem, data bias and confounding variables are of paramount importance.

So "equal pay" means whatever you want it to mean because try and you may, you will never have a model with the full infinite set of dimensions that the real world has. Having to pick you pick the ones you want, a choice that others will attack.

5 comments

You cannot dismiss rigorous statistical analysis by arguing it can never encompass the full dimensions of the data. Of course it can't. The map is not the territory; it is a useful way to find our way around it. Ignoring the map is perilous, if not arrogant, even though it is merely a flawed representation of the real truth.

You might argue that a specific study or meta-analysis contains a bias or misinterpretation, but only if you've actually examined their methodology, data, and reasoning. You cannot argue that all studies of complex topics are invalid simply because their topics are complex.

> You cannot dismiss rigorous statistical analysis by arguing it can never encompass the full dimensions of the data.

This simply means it's not rigorous. See Omitted-variable bias - from [1]: The bias results in the model attributing the effect of the missing variables to the estimated effects of the included variables. For example, including gender but not education or hours worked will result in attributing pay differences to gender, but including all relevant variables shows that's gender is irrelevant.

https://en.wikipedia.org/wiki/Omitted-variable_bias

You entirely missed the point. We can never include every single relevant variable to perfectly explain the observation. It's impossible.

This doesn't mean statistics is useless.

This is the meaning of the phrase "the map is not the territory". All models are flawed, but some are useful.

No, statistics aren't useless, but its usefulness cuts both ways: if you can add one or two relevant variables and almost entirely remove the observation, then statistics tells you that the observation was only there due to omitted-variable bias.
Sure, that's fine. That's part of using statistics to the best of our ability.
I think there's some middle ground between saying that the analysis is rigorous and saying it's useless, no?
Indeed. But that's where all the hard work is -- trying to determine how rigorous something is, knowing it certainly isn't perfectly rigorous.
>You cannot argue that all studies of complex topics are invalid simply because their topics are complex.

If you take a random sample of studies you can make a statistical analysis. You don't need to examine every cow to make an argument that there are no pink cows, but you do need to do a random sample. And that's if you only want to meet the highest standards of evidence. Much lower standards can be far easier to meet.

> If you take a random sample of studies you can make a statistical analysis. You don't need to examine every cow to make an argument that there are no pink cows, but you do need to do a random sample. And that's if you only want to meet the highest standards of evidence. Much lower standards can be far easier to meet.

Your comparison of this problem with pink cows shows that you haven't given it two seconds thought. Estimating the number of pink cows in the world is a very simple problem. Determining pay gap is a very very complex problem that starts with defining what the question really is and associated fights between different interest groups which might prefer one or another definition, then goes on to the (social, privacy, and rights) problem of obtaining the data, and moving on into with the data analysis itself which is just hellish if you want to have any semblance of rigour, and finally policy take aways from the analysis which hinges crucially on how you defined the question initially.

Stop adding to the noise please.

>Estimating the number of pink cows in the world is a very simple problem.

If by 'estimating' you mean a scientific study that tries to answer the question, then it isn't simple at all. First we need a rigorous definition of pink cows. If I dye my cow pink, does that count? What if other people don't agree with my definition? A pig whose skin is pink is considered pink, so should I only rely on hair color? And what counts as pink? Are we only going with stereotypical hot pink? There is a red cow, but it is a really brownish red. Would a brownish pink be enough to qualify as a pink cow?

So once we solved all those problems, we need to come up with a methodology, and it likely won't be the same everywhere. We could make the problem a lot simpler by reducing our search space to say, only cows on ranches in the state of Montana. But to do a global sampling isn't easy.

>associated fights between different interest groups which might prefer one or another definition

To my knowledge (and with no peer reviewed research to back up my view), there is no groups who have a political stake in what counts as a pink cow. So for that reason it is simpler because there aren't political complications.

But you seem to be confusing something. You appear to be talking about studying wage gap. I was talking about studying studies of wage gaps.

So for my plan, it would work like this:

Taking all the studies of wage gap in the last n years, pick x at random. For each of these, determine if each one does or does not account for some factor that impacts pay regardless of gender (say height of employee). You can then compute what percentage of studies took this factor into account.

Then you repeat this with a few other factors, each time repicking the studies investigated. From those percentages, you can determine how often your selection of factors are taken into account, and from that you might be able to make the argument that the data is biased enough to not be usable.

Isn't that why we have 'replication crisis' in fields that 'can never encompass the full dimensions of the data'

Doesn't replication crisis prove 'that all studies of complex topics are invalid simply because their topics are complex.'

Science as a whole has developed knowing that it is impossible to encompass the full dimensions of the data, the goal is to find the best explanation given the available evidence.

The replication crisis is a result of the misuse or misunderstanding of the statistics, and the current nature of journals.

A heuristic in which you refuse to undertake any action without complete information of perfect reliability is always biased towards the status quo. Heck, it's straight from the CIA Simple Sabotage Field Manual. So in the guise of "first needing to understand the complexities of the problem", you are rationalizing away the preponderance of evidence which shows that, yep, any way you cut it, there's a gender wage gap.
> A heuristic in which you refuse to undertake any action without complete information of perfect reliability is always biased towards the status quo.

I agree with that phrase, and I acknowledge that it's a problem, but you're jumping into conclusions about what I was trying to say. I wasn't trying to say "we don't have complete information, so we should do nothing." Read on:

> you are rationalizing away the preponderance of evidence which shows that, yep, any way you cut it, there's a gender wage gap.

No, that's is precisely my point. It is NOT true that any way you cut it there's a gender gap. If you let me cut it how I want it I can have the gap be anything I want by carefully (as an example) picking which of the omitted variables I adjust for sampling bias and which ones I don't[0]. That is what I was trying to say.

[0] And lets not talk about confounding variables, that problem is at least an order of magnitud harder even than sampling/population bias.

In case anyone needs a PDF of the CIA Simple Sabotage Field Manual

https://www.cia.gov/news-information/featured-story-archive/...

> Exactly this. At a fundamental level every individual has an infinite number of "dimensions" (in ML parlance) associated with him, and "equal pay" people or groups will try and convince you that expected pay should be the same over a certain subset of dimensions while ignoring all the others.

Surely the right response to a study which challenges your existing worldview would be "Hmm, that's interesting - I wonder what is driving that?" rather than "The equal pay people or groups will always try to convince you..."

It seems that GP did that, and learned that these studies ignore inconvenient factors.
What we should mandate is transparency. We all think we are expert negotiators but we are all idiots. We will all be better off if all salary and all compensation information is public and easily accessible. Sadly, a lot of people think they have something to lose and will never support it.
It would be an interesting experiment. Has it been done before? That could lead to some unpleasant things:

    * A lot of unavoidable angst as people of less worth to the business are proven to be paid less in no uncertain terms.
    * More internal strife as people jockey for identifiable rank within the organization based upon their salaries.  "Why is Sue paid $10k more than I am?  Sue wasn't at her desk all week last week while I was here busting my butt."
    * Eventually, many managers and organizations would just sidestep the battle by paying everyone the same thing based upon easy-to-identify metrics like seniority.  As a result, the people with more value to the business will find jobs at companies that pay them according to a better measure of their bottom-line worth.  With no one left but the lowest-common-denominator employees, the company flounders and fails.
> Eventually, many managers and organizations would just sidestep the battle by paying everyone the same thing based upon easy-to-identify metrics like seniority. As a result, the people with more value to the business will find jobs at companies that pay them according to a better measure of their bottom-line worth. With no one left but the lowest-common-denominator employees, the company flounders and fails.

In most fields, there are already companies which pay wildly different amounts for the same jobs, so the people contributing more in those similar roles are already highly incentivized to leave for higher-paying pastures.

In certain Nordic countries (Sweden at least, I believe), all tax data is public. By extension, everybody's income is public as well. It doesn't seem to be an issue.
A quick Google turned up some refutation of your statement:

https://news.ycombinator.com/item?id=9907147

Looking further, it appears that Sweden has decreased the amount of transparency by requiring that people make specific requests that notify the taxpayer of the request.

http://www.businessinsider.com/sweden-salaries-freely-availa...

Apparently, there were issues. I'd like to see this experiment run for a longer period of time in more culturally diverse environments, though.

I don't want my salary to be transparent. There is something called privacy. My salary is a private matter. I really don't believe I'm any kind of expert in negotiation.
His point is that your need for privacy is preventing group wins. You want to avoid a bit of shame or envy but by making this decision we, as employees, lose a lot of our leverage.

You wouldn’t have to negotiate a better salary if it would be obvious that you are underpaid.

I think a "fair" distribution of pay for software engineers would more unequal than it is currently. (This is the logical financial conclusion of believing in the 3x, if not 10x, engineer, which I do.)

I have people who work on my teams who are absolutely fantastic and, while already well-paid, probably should make more. I have other people on my team, with the same title, same education, same on-paper responsibilities, same city, same years of experience, who might be below the median pay and are still overpaid based on my estimation of their contributions relative to their peers.

You can't look only at a spreadsheet and determine that it's "obvious that you are underpaid", IMO.

It would also prevent raises.
> I really don't believe I'm any kind of expert in negotiation.

That means that transparency works _for_ you, not against you.

But that doesn't mean I'm ready to give up my privacy.
Why? What benefit does that entail? Is is greater than the benefits of transparency?
Yes. My privacy is not a trivial matter.

Think of it like this. Will you be willing to post your entire web browsing history to a publicly available archive regularly? It will help prevent a lot of illegal activity if everyone agreed to do that.

But the bar -- as established by the 77% number -- is that a difference in pay is sexism.
> Exactly this. At a fundamental level every individual has an infinite number of "dimensions" (in ML parlance) associated with him

And of course, this applies not just to people but to most complex entities or ideas. The problem is, when dealing with humans, even intelligent ones, good luck getting them accept this approach when it interferes with their political/emotional/fiscal beliefs or desires. For reference, see recent discussions here on topics like trade tariffs.

We can use bayesian inference here to update the probability of equal pay as more dimensions are added.