Hacker News new | ask | show | jobs
by sb057 2377 days ago
A version which was rejected included the following (non-Apache-affiliated [afaik]) references:

https://www.xenonstack.com/insights/what-is-apache-arrow/

https://link.springer.com/chapter/10.1007%2F978-1-4842-1311-...

https://www.biorxiv.org/content/biorxiv/early/2016/08/23/071...

http://delivery.acm.org/10.1145/3110000/3103003/p138-Maas.pd...

https://www.theregister.co.uk/2016/02/17/apache_arrow_toplev...

https://www.cio.com/article/3034279/big-data-gets-a-new-open...

https://www.infoworld.com/article/3033446/hadoop/apache-arro...

https://sdtimes.com/apache/guest-view-first-release-apache-a...

https://www.infoq.com/news/2016/12/le-dem-apache-arrow/

http://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-parq...

https://dbmsmusings.blogspot.com/2017/10/apache-arrow-vs-par...

3 comments

The 1st source is a blog post on a consulting company website.

The 2nd mentions Arrow only in passing, after several pages of coverage of Spark; Arrow is covered only in relation to Spark. It's a reliable source but doesn't clearly establish notability.

The 3rd mentions Arrow hardly at all; it's an implementation detail, mentioned just once, in a paper about something else.

I can't fetch the 4th.

The 5th, a story in The Register, is reliable and probably does go towards notability, though it seems to sort of argue against it (the gist of the article is that it's surprising that Arrow has been made a top-level project at all).

The 6th, in CIO, is a recap of a press release. Trade press PR recaps shouldn't be WP:RS, but WP will often accept them, or would when I was patrolling AfD; it's luck-of-the-draw. The admins who shot down Arrow's page were smart enough not to accept it.

The 7th, in InfoWorld, is promotional as well, but it's at least written in some depth. It's a straightforward notability claim. The Arrow article should draw more clearly from it, in the opening paragraph.

The 8th, in SDTimes, is written by someone affiliated with the project itself; it's citable, but WP probably won't accept it independently as grounds for notability.

Same, in effect, for the 9th, which is just a recap of an interview with the project author.

The 10th and 11th are just blog posts. They're citable if they're not contentious, but they usually won't be acceptable as WP:RS for notability.

Blog posts are prima-facie evidence of notability. Same thing with mentions in published articles. From the book (second link):

"Recognizing that Value Vectors meet the needs of other data processing engines, in February 2016, the Apache Software Foundation announced Apache Arrow as a top-level project, bypassing the standard Incubator process. Committers to the project include developers from other Apache projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark and Storm.

Apache Arrow enables execution engines like Spark to take advantage of the latest operations included in modern processors, for fast analytical data processing. Columnar layout of data allows for better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible. ...

Apache Arrow software is available under the Apache License v2.0.

Dremio, a startup led by Jacques Nadeau, chair of the Apache Drill and Apache Arrow Project Management Committees, leads the development."

In the past, this and the other sources would have been more than enough to establish notability. I know that because I have created Wikipedia articles on subjects much less notable than that. The problem for Apache Arrow isn't that it isn't notable enough, it is that people have already tried four times to get it included in Wikipedia so the Wikipedians voting on new page inclusions are getting suspicious about it.

If you want to sum up something like 10 years of debate and consideration of the role of blogs as sources (it’s much more complicated than that they’re not allowed) by saying, in effect, “you’re all wrong”, well you do you.
I'm merely saying that you are wrong. Blogs are not always reliable sources in the Wikipedia world, but they can absolutely be used as evidence for notability.
Not routinely, and not most blogs. As you can clearly see from the admin comments on this Arrow post.
Yes, routinely. You can find plenty of articles which had much less support in sources when they were created here https://en.wikipedia.org/wiki/Category:AfC_submissions_by_da... That Wikipedians rejected the article is a moot point because the argument is that the rules are not applied consistently.
Of the 10 links you list (the dbsmusings link appears twice), 5 are used to back up the claim that Arrow was “donated to the Apache Software Foundation[7] in 2016, where it has been maintained and extended since.[7][8][9][10][11]”, which doesn’t really seem like it needs that many sources.

Of the other half, one appears to be some sort of marketing blogspam, one is a paper that briefly mentions that they used Arrow, and two I can't access for various reasons. That leaves one blog post that actually discusses Arrow, and the sentence it's used as a reference for in the draft article isn't about Arrow specifically, but the tradeoffs of in-memory vs on-disk storage.

Yes, these links may be independent of the Arrow project, but I'm not convinced that they add anything of substance to the actual content of the article. Mostly it looks like they were added in an attempt to game the number of references.

The blog post should have included these citations because I was left wondering what they did to support their claims. It sounds like they probably should have an article but that they also have misunderstandings of Wikipedia.