Hacker News new | ask | show | jobs
by TallGuyShort 2379 days ago
Having worked on commercially-resold Apache projects, can't say I argue with Wikipedia a whole lot on this. It seems to me it should be let in, but it's a bit silly to go and call them out like this on a corporate blog, IMO.

Dremio does benefit from Apache Arrow publicity and notoriety, even if they don't profit directly. Having a de-facto standard data format and open-source engines is a selling point for some. That's why Dremio explicitly calls it out on their own website. It also never hurts in the recruiting department. (edit: there's a reason the article was submitted by someone working in marketing & strategy)

>> I’m wondering if Wikipedia can continue to be considered a reliable source of information for technical folks who want to learn more about the vast system of Apache open source software projects.

Sign up for the Olympics, because that's a hell of a leap. You didn't get your page in, it's really not much of a reflection on the rest of Wikipedia. It's an open-source project. It should have it's own freely available documentation that fills much the same purpose anyway. If I want to learn about Apache X, I go straight to x.apache.org. They concede that it's not an end-user product anyway, so I'd think their key audience knows how to find an open-source project website. Lower the bar too far the other way, and there are plenty of semi-open-source project's marketing departments would be all over using Wikipedia to their own ends - I've seen my own former employer do this for their Apache projects.

5 comments

This was posted by the direct of marketing, on their marketing blog... And the wikipedia article mentions, "efficient, effective, optimized" multiple times in the introduction paragraph. Compared to column-store [1], which the OPs article links to, it only mentions it once at the end and in weaker language.

As it stands, the Apache Arrow entry reads like a press release. I would recommend that Justin has a non-marketing copy editor clean it up before pressing the case further.

[1] https://en.wikipedia.org/wiki/Column-oriented_DBMS

> I’m wondering if Wikipedia can continue to be considered a reliable source of information for technical folks who want to learn more about the vast system of Apache open source software projects.

I'm confused why the writer thinks it should be!

The Apache Foundation is a big tent. There's some clearly notable projects in there (like Apache httpd), but there's also a lot of really obscure crap that basically nobody outside ASF cares about (like Apache Creadur or Apache Pony Mail). Expecting Wikipedia to document every Apache project is ridiculous.

Is this particular project notable enough for a Wikipedia article? I don't know a lot about it, so I can't say for sure. But the article drafts that I've seen don't convince me that it is.

> Expecting Wikipedia to document every Apache project is ridiculous.

Wikipedia has a lot of rather obscure entries. Long, long lists that I think easily under-rank the Apache project.

I'm not saying you're wrong, but the bar for notability is kinda vague. Lists of every episode of a series, every kind of kim-chee, etc.

I would have said Wikipedia is a questionable source on "open source software projects", period.

The idea that Apache software projects are too obscure to include, while every single individual episode of Buffy the Vampire Slayer has a detailed article, is pretty typical of the site.

On some occasions I would love to have articles about open source projects that are written in the high quality and non-point-of-view tone that Wikipedia encourages. A project's own API docs are obviously going to be a better source for, well, the API, but a blunt description of what a project is is something Wikipedia can definitely provide, and I don't see why they don't.

I certainly use Wikipedia to understand what various companies are. There are companies whose own websites seem deliberately designed to obfuscate what the company is. I don't see why Wikipedia couldn't provide the same benefit for open source projects.

> Sign up for the Olympics, because that's a hell of a leap.

I agree. Personally, I don't think I've ever used Wikipedia to learn about an OSS project.

> Personally, I don't think I've ever used Wikipedia to learn about an OSS project.

I think I might be part of the silent majority that actually does -- I often use Wikipedia to learn about the origin story of an OSS project. (not random tiny OSS projects, but more established ones)

Project websites don't tell you stuff like the original author, key people, context, adjacent categories of software, the history, the original problem that it was trying to solve, the drama (fights, competitors, disagreements between folks involved), and evolution of the project over time. The Wikipedia article often does.

This type of intelligence is invaluable when evaluating projects/products. If you're not wiki'ing your OSS project, you'd have to google and wade through mailing lists and piece together the story from blogposts, tweets, etc.

Here are some examples:

https://en.wikipedia.org/wiki/Spanner_(database)

https://en.wikipedia.org/wiki/Cockroach_Labs

https://en.wikipedia.org/wiki/QEMU

https://en.wikipedia.org/wiki/SQLite

FYI, there will sometimes be nice comparison tables to make comparing software easier. For example: https://en.wikipedia.org/wiki/Comparison_of_deep-learning_so...
There is a set of policies that Wikipedia is supposed to follow when it comes to deciding if a page should be in or not. Nothing in this set of policies disqualifies a page if it benefits a company. Or even if it was written by employees of that company.

Thus, Wikipedia is violating its own policies. It follows that decisions on whether a page should be created becomes arbitrary which opens up the door for corruption. Some company pays Wikipedians and get their page(s) created, others don't and don't get any page(s).

I don't disagree with anything you said, but I mentioned the benefit because the blog explicitly denies any benefit. But the reviewers do call out a conflict of interest. They criticized it because it read like an ad, and I agree. I've seen other Apache projects (looking at Drill) that read like an ad, and it's annoying.