Hacker News new | ask | show | jobs
by throwvid19 2176 days ago
The number of comments in here that ask for additional/richer data and are being flagged is supremely concerning.

I understand that this is a hot issue with very polarized sides, but in what kinds of circumstances is having more data bad, except to intentionally support bias?

Please understand that my question is only to understand why, in the context of accumulating data, is trying to obtain more/better data NOT a good thing?

I do not wish to debate what is already being said in the many comments already, so if it helps to change the context of the data in question in order to discuss, that sounds like a good idea to me.

5 comments

I think context is good and we should be wary of videos that have been edited to remove critical information.

However, there is such a large corpus of evidence against the police in these protests, I have to wonder if the people asking for more context are doing so in good faith, or rather to argue for the innocence of the police. Having been personally on the receiving end of police violence during these protests, it bothers me that anyone could look at this wealth of videos and see anything other than a clear pattern of institutional violence being wielded against those who are in opposition to just such violence.

One thing that comes to my mind (note: I do not support this view) but I am trying to put myself into the opposite party's shoes:

- Listing videos of police brutality during protests without also listing videos of protestors brutally attacking the police perhaps creates a dissonance to the counter party?

I think there have been a few incidents were cops were attacked but they are far and few between, but listing those would help clear the accussation of hypocrisy.

Furthermore, I personally think that we should separate police brutality videos in normal civic life (before the protests began) to gather evidence of systemic violence vs. the enraged/emotionally outraged protests that both sides were not willing to concede. I categorize them as different.

I really doubt there's more than a handful of instances in which protesters use violence against the police in a context that's not self-defense.
The constant request for more and more "context" is textbook Sealioning [1]. When faced with an overwhelming video evidence of misconduct, the only viable way to deny it is to endlessly seek out some kind of magical exculpatory "missing context". That's pretty the only way misconduct-deniers can sow doubt at this point. It's bad faith argument and disheartening to see here on HN.

1: https://en.wikipedia.org/wiki/Sealioning

Unless it's true. Look at any court case for example. If you only heard one side you'd think that side was obviously right.
Your argument is the essence of "sea-lioning" itself. There is indisputably a tremendously large dataset of unassailable evidence (video, cross-referenced personal accounts, audio), things we consider meeting our "beyond a reasonable doubt" in the court of law. And, yet, here you are, saying "maybe we don't have all the facts." At what point are you simply wrong? Never?
While it's true that there is tremendous data of many events of police overreach--corroborated by many sources--certainly not all events in this data set are corroborated. How many? We can't know without enriched data.

Given that this is a resource for events of police brutality, it should be no surprise that contributors are likely to be biased to report events favorable to the assertion of brutality. The general premise asked, "what wrong thing was done to you/your people?" which is likely to result in emotionally biased response. If it was a dataset of reports of interactions between two different ant colonies, for example, the general bias is likely to be significantly lower.

If the goal is to understand the relationship between a population and their police, then obtaining data from both sides would be ideal. Of course, that's not what this data set is, which is why some people are raising concern of bias. As a data set, it's use is limited to support one aggrieved side. This is not much different from training ML models on, say, only Caucasian faces: it may work if the intent is to recognize or generate Caucasian faces, but it is by no means general purpose. As such, it seems reasonable to question the fitness and intent of this data set.

No data set is perfect, and we'll never have "all the facts." But I don't think upholding inherently-prone-to-bias data as "good enough" is a reasonable response to questions about its bias. We cannot achieve perfection, but that doesn't justify denial of bias in the data.

On the matter of "sea-lioning", I've never heard the term before, and I'm not sure if I've ever been exposed to this type of trolling because to recognize it would require me to be able to read minds. However, I understand the forum guidelines charge us to assume the best interpretation of any comment, so I am disinclined to assume that people here asking for more data are trolling. The essence of claiming "sea-lioning" appears, at least at face value, to be an alternative to saying "I don't have to explain myself to you" while maintaining the illusion of taking the high road.

People have brought up the very reasonable concern that the data is extremely prone to bias, and those people are being silenced. It seems this is because it's not a popular idea to challenge the aggrieved party, not because the data is somehow unbiased and they're asking for something unreasonable. This seems unusual for an otherwise truth-seeking community.

Perhaps I am wrong, and I recognize everyone has their own bias and not everyone always acts in good faith. I have just come to expect more from this community than what I've seen in these comments. It seems good faith is not assumed in many cases among these many conversations.

If people want more data, why don't they go get it themselves? Then they could supply it to the rest of us here and tell us what they learned.

The people who built this Github repo voluntarily spent their own time and effort to do so. The people here on HN who think the repo is incomplete can do exactly the same thing if they want to--put in some effort--and thereby address the concerns that they themselves raised.

I generally object to comments here that demand more info, more citations, or complain there might be something missing. How about: do your own work.

"Self-starters teaching themselves what they need to know" is an idea that finds powerful agreement here on HN when it comes to developing software. Somehow, though, on other topics, there sometimes appears a group of commenters who seem more inclined to sit back, complain, and demand answers from everyone else.

I will say that it has occurred to me that demands for more data might not always be in good faith. It has occurred to me that such open-ended questioning might be a convenient way to undermine conclusions that contradict personal beliefs--while avoiding direct conflict over the substance.

> ...in the context of accumulating data, is trying to obtain more/better data NOT a good thing?

Depends if ALL data is collected or ONLY data that supports certain viewpoints.

Here's a thought I've had recently: it might be worthwhile to have a page, perhaps linked to from the topic's comment page, that shows all flagged comments for said topic. People would not be able to reply to these comments, but everyone would be able to read them. I'm not sure if it has any merit or not, but it's just an idea.
You can see them if you turn on showdead, BTW.
Oh, thanks very much!
You're welcome, glad to help!