Hacker News new | ask | show | jobs
by tptacek 2376 days ago
Open source projects are particularly tricky for Wikipedia. There are tens of thousands of them. Their owners are often passionate. They compete with each other, so there's incentive to write hard-to-adjudicate competing claims. Many have commercial backing, which further warps incentives. The projects themselves are highly technical; many, like Arrow, are software development tools and components. There are few authoritative sources that reliably track open source projects. Keeping up involves directly following bug trackers and message boards and then synthesizing a narrative, which is the definition of "original research", forbidden in the encyclopedia.

It's likely that Arrow does deserve a WP article. But Arrow's sponsors misunderstand more about Wikipedia than Wikipedia does about Arrow. Writing a defensible article about their project will require work; in particular, they're going to need to spend the time tracking down authoritative sources for why Arrow is notable, and those claims will probably need to be something more persuasive than "hundreds of companies use it"; hundreds of companies use all sorts of things that don't, and shouldn't, be featured in their own encyclopedia articles.

I understand the impulse behind "this project is important; it should have a Wikipedia article". But when you take a step back and accept what Wikipedia actually is, rather than what you think it should be, you're left with the question: do we really need to feature this particular piece of software in its own encyclopedia article? 20 years from now, will people still be getting value from it? Whatever value that might be, will it outweigh the 20 years of other people's volunteer efforts to maintain the article, keeping it free of vandalism and ensuring that it doesn't surreptitiously turn into a promotion piece for some company or another?

The answers might be "yes". But I don't see much evidence in this piece considered the questions.

Lots of things that don't seem deserving have in-depth Wikipedia coverage. Many of those things probably really don't belong in an encyclopedia! But there are two sides to this problem: the merit of the topic, and the cost, in volunteer time, of including them. A marginal topic can be defensible if it's easy to reliably cover it. A seemingly important technical topic might not be if the only way to say anything interesting about it is to write original research directly into its article.

Late edit

A useful tip for getting your open source project covered in its own Wikipedia article: don't have the Chief Marketing Officer of the company that owns the project write the article.

5 comments

This is a great comment; I'll just add one other thing, which is something I've mentioned before in arguments about Wikipedia: Wikipedia's goal is verifiability, NOT truth. "Truth" is explicitly a non-goal of the Wikipedia project. For any given subject, Wikipedia is not meant to provide the truth about that subject, it's meant to be a summary and distillation of the existing reliable sources about it. If there are none, that's neither Wikipedia's fault nor its problem.

You can take issue with this goal, but that's how it works, and it's also how encyclopedias have always worked.

>and it's also how encyclopedias have always worked.

Well... Hopefully verifiability and truth have some correlation. Otherwise I'd argue that verifiability isn't worth much. What is different from traditional encyclopedias is that they did make determinations about what was important (which is at least akin to notability) and would allocate articles and pages as appropriate. From today's perspective we might dispute the judgments of importance but they were there.

The Wikipedia meta article this is drawn from does a better job of answering this concern than any of us can.

https://en.wikipedia.org/wiki/Wikipedia:Verifiability,_not_t...

Argue with it if you must, but let's try not to make the thread tediously recapitulate it.

Hopefully verifiability and truth have some correlation.

Not as much as you would hope.

I have two sisters with Wikipedia articles. Let's pick https://en.wikipedia.org/wiki/Jennifer_Tilly for one of them. It claims that her mother was Irish and Finnish, and goes on to list how many siblings she has. Those statements are verifiable but false. You can find an article written by reporters that said those things.

She isn't Irish, her step-father (my father) was. She also has 2 more brothers than are listed in that article. That is true, but not verifiable. Nor will they ever be verifiable. And therefore Wikipedia will never be corrected.

The problem here is that the Gell-Mann Amnesia Effect (see https://www.goodreads.com/quotes/65213-briefly-stated-the-ge... for an explanation) guarantees that there will be lots of verifiable statements that aren't so. Wikipedia builds a coherent view of a subject on that sand, and it is very hard to find what it is mistaken about. But it is riddled with errors that will never get fixed because they were wrong in a verifiable primary source.

And information not captured in a verifiable primary source will never make it in. For example her grandfather was the T in https://www.cmtengr.com/. Good luck verifying that one!

>And information not captured in a verifiable primary source will never make it in

In theory. In general? I was just looking at an article where I have a lot of personal knowledge.

Is mostly True, as far as much of my first-hand knowledge can tell. And leave aside a couple of the random personal insertions that are definitely True if outside of all proportion to the rest of the article.

But there's one section in particular that goes into even more detail than I knew even as someone fairly in the depths of this particular thing. (But it's very plausible and consistent with what I do know.) It's certainly not something that's ever been written about publicly AFAIK and the actual references in the article are minimal.

Which comes back to that notability/verifiability/etc. are nice theories--and may even make sense in the abstract--but there's a huge amount of inconsistency depending upon whether someone has taken notice of an article or not. (And, in at least some cases, I'm often happy with people not looking too hard.)

Which inconsistency is, of course, what you'd expect from an all-volunteer project.
Sure. I'm also not sure that the fact that Wikipedia's rules often fall through the cracks is entirely a bad thing. You end up with some unverified information. You also end up with maybe somewhat unreliable information that would never have been verifiable. Even if I can't fully endorse this sort of informal breaking of the rules, I'm not really opposed to it either.
Wikipedia says "her mother was of Irish and Finnish ancestry."

Jennifer Tilly is your sister, but her mother's step father is your father?

Her grandfather is her brother's father?

Are you seriously confused by my carelessness with pronouns?

Jennifer and I are siblings. Our mother's mother was Finnish. Our mother's father (the Tilly in CMT) was a complicated mix. Jennifer's father was Chinese. My father was Irish.

She was born Chan, I was born Ward, our names were changed to our mother's maiden name after her divorce from my father.

All clear?

The "Gell-Mann Amnesia Effect" being the banal fact that reporters are sometimes wrong about things?

Have you tried leaving a comment on the Talk page of the article saying that you're Jennifer Tilly's sister, linking to something about you (you're obviously bona fide), and asking for a correction? WP has special reliability rules (WP:BLP) for "Biographies Of Living Persons".

It doesn't look like CMT has a Wikipedia article at all. Should it?

The "Gell-Mann Amnesia Effect" being the banal fact that reporters are sometimes wrong about things?

Sometimes?

I've yet to read a feature article written by a reporter on a subject that I know well which didn't have multiple mistakes.

Have you tried leaving a comment on the Talk page of the article saying that you're Jennifer Tilly's sister, linking to something about you (you're obviously bona fide), and asking for a correction? WP has special reliability rules (WP:BLP) for "Biographies Of Living Persons".

Actually I am one of the brothers that Wikipedia does not know about.

Back in the 2007-2008 period I decided to make some obvious corrections. They got rejected. I left some comments in talk. A couple of my comments are still there on Jennifer's talk page.

If you want to try to fix the page, you could use http://www.officialmegtilly.com/blog/megs_made_up_muffins/ and http://www.officialmegtilly.com/blog/hell_in_a_hand_basket/ as evidence that Meg has at least one brother that Wikipedia doesn't know about. Good luck getting it changed.

As for CMT, you tell me. It is a civil engineering company that has existed for decades and has a significant presence in multiple states. But there isn't much about them online other than the company website. Which, by definition, is not considered reliable.

I have it in for the "Gell-Mann Amnesia effect" (is there even evidence that Gell-Mann believed in it?), but your point is well taken: Wikipedia's rules do heavily privilege journalism, and journalism is merely the first draft of history, not the camera-ready final.

It's possible that Wikipedia has carefully balanced this; if they didn't privilege reporting, a lot fewer articles would get written, about a lot of things people actually do want to look up in the encyclopedia. Reliance on journalism means they'll routinely get some bad facts, but there's a bound on how bad things will be that there wouldn't be if they just got rid of WP:RS altogether.

It's much more likely that nobody has carefully thought about this, and it's just a shambolic volunteer project taking advantage of what they have to work with.

My basic take about Wikipedia is that it's hard to argue with the results. However obnoxious their policies are to nerds like us (and I commented upthread about obnoxious experiences I've had working on it --- I no longer contribute!), it's a tremendously successful project, perhaps one of the most successful in the history of the Internet.

It's bad when they have bad facts, more so when those facts pertain to living people, even more so when someone has the correct facts and can't get them accepted, and especially so when that person is a family member of the subject.

It's less bad, to me at least, that an encyclopedia happens to lack a page, for now, on Apache Arrow.

We're basically into the deletionist vs. inclusionist debate that is at least somewhat orthogonal to what laypeople think of as notability. Is a Pokemon character notable. Not really?? But because of the enthusiastic fan base tons have been written about them.

On the other hand, whether you're talking open source projects beyond the big names, corporate executives, or just people who are reasonably well known within fairly large communities, there just isn't a lot of independently sourced published material about them, especially in mainstream pubs--which (somewhat both understandably and ironically) Wikipedia tends to prefer. You even have people with tons of hits on Google but there isn't a ton of info about them online.

What "debate"? This isn't a live debate. There is a faction of people, some of whom are involved with Wikipedia, that want it to be something other than a tertiary-source encyclopedia, just like there are people who want to be able to write blog posts as Stack Overflow comments. It's true that they will never stop advocating for these changes, but there's no evidence that the projects themselves are going to cave.
Maybe it's not a debate so much as a tension--and it's a real one. Personally, I haven't contributed anything to Wikipedia in years. It's useful, I see its flaws, but I certainly don't care enough to push on it for the most part.
I'm exactly the same way. For instance: I did some writing about macOS security in the macOS articles, way back when, and most of it got struck because I couldn't cite it properly. It was frustrating to write a straightforward statement, like "the macOS Seatbelt sandboxing mechanism uses s-expressions", and have it get struck.

But I came quickly to realize the project was right. Without a reliable secondary source, I was effectively conducting research in the pages of the encyclopedia. What I learned from that was: I shouldn't be writing encyclopedia articles; the technical writing I do tends not to be tertiary.

It's fine – good, in fact – if most people don't write much in Wikipedia. It's its own special thing. You can't argue with its success: it might be the most successful project in the history of the Internet, and a long-term contender for one of the most successful volunteer knowledge projects ever.

Th number of wigglypuff fans exceeds the number of Arrow fans by at least 10x. And the article is higher quality.
If a bulldog clip is notable, why wouldn't an open source project that hundreds of companies use not be?

https://en.wikipedia.org/wiki/Bulldog_clip

s/bulldog clip/many other random office supply items/

Edit: Swapped to bulldog clip as a better example of a less notable office supply.

This seems like an argument that says that Apache Arrow is as important as the paper clip, which would be an extraordinary claim.

That paper clip article is itself extraordinary. Go look at it again. It delves into the history of the paper clip, covers different designs, has excerpts from paper-clip-making-machine patents, and describes an actual controversy(!) over its invention, all carefully illustrated (illustrating things on Wikipedia is a bitch, by the way, because of IPR rules). People went through a lot of effort to make a good paper clip article.

And Wikipedia considers the paper clip article to be a "C-class article" (C here means approximately what it means in school), and the topic of "low" importance. Just so we're clear on what the bar is here.

Compare that with the author's attempt at an Arrow article:

https://en.wikipedia.org/wiki/Draft:Apache_Arrow

It's a paragraph of promotional material, a brief comparison to other systems, and a citation to a blog post saying "I do not see any reason not to embrace the Arrow standard".

Come on.

I think there probably should be an Arrow article. The authors have found a bunch of reliable sources covering it; they just haven't distilled from them a defensible claim to Arrow's notability. I think it's a matter of putting the work in.

>This seems like an argument that says that Apache Arrow is as important as the paper clip, which would be an extraordinary claim.

I picked the first office supply object that came to mind. There are better examples.

For example, why have the bulldog clip as it's own article when you already have binder clip?

https://en.wikipedia.org/wiki/Bulldog_clip

https://en.wikipedia.org/wiki/Binder_clip

I highly suspect that with some actual effort I could find an even less deserving office item.

And you may be right that Arrow needs to do more to be notable and ready for its own page. But ignoring some objective standard and instead looking at a relative standards of other articles, it does feel like there are some unequal requirements in this regard.

The binder clip article has many of the same merits as the paper clip article. The bulldog clip article is more interesting: it's a "stub" article (its authors are explicit about the fact that it's not a complete article), and still it manages to track down some of its history and cite interesting uses from books – someone had to read those books and fish the bulldog clip cites out of them.

I think it's pretty clear to anyone why bulldog clips are in the encyclopedia, and it is only clear to subject matter experts with strong opinions why Arrow would be.

If your topic requires subject matter expertise in order to recognize its importance, the standards are unequal: you are going to need to do more work to establish its notability, because you cannot reasonably expect the layperson volunteers in the Wikipedia project to do that work for you.

An item which almost every office worker has seen or used is definitely notable enough to get an article. Yet another data format among hundreds which has yet to reach a wider audience could be, but it is not obvious.
I'm sure an order of magnitude more of "hundreds" of companies use paperclips.
> I understand the impulse behind "this project is important; it should have a Wikipedia article". But when you take a step back and accept what Wikipedia actually is, rather than what you think it should be, you're left with the question: do we really need to feature this particular piece of software in its own encyclopedia article? 20 years from now, will people still be getting value from it? Whatever value that might be, will it outweigh the 20 years of other people's volunteer efforts to maintain the article, keeping it free of vandalism and ensuring that it doesn't surreptitiously turn into a promotion piece for some company or another?

I really don't think a 20-year-view is a good measure of whether or not an article should exist. Even if something is forgotten in the future, if it has relevance and importance today than that alone makes the article worth existing.

For-profit businesses are particularly tricky for Wikipedia. There are tens of thousands of them. Their owners are often passionate. They compete with each other, so there's incentive to write hard-to-adjudicate competing claims. Many have commercial backing, which further warps incentives.
They are! Spend some time patrolling AfD. They're a huge problem; companies are constantly trying to get themselves into Wikipedia, because Wikipedia is heavily privileged in Google search results. But for-profit companies tend to present clearer cases for WP volunteers: they're either well-covered in reliable sources, in which case they're easy accepts, or they're not, in which case they're easy rejects.

The problem with OSS is that lots of projects probably do merit pages, but it's hard to see which ones.