Funny how when Elsevier tries to enforce its copyright, HN acts like they're the devil, but when the AI training data is the subject turns into the strictest IP rights warriors I have ever seen.
They get papers for free, that were funded by the public, or other entities. Then they charge people obscene amount of money to let them download the PDF.
This is dictionary definition of "parasite".
Nobody funds most artists. Buying artwork isn't funding. They produce art from their own funding. Then a company leeches them off and trains models without compensating them.
I am okay at training models on arXiv papers. The authors consented to spread the knowledge publicly.
With such dumb-logic comments, you make it hard to take your point-of-view seriously.
Edit: The copyright of the paper _authors_ isn't being protected. They are being blood-sucked. And, before the Hub, if you emailed an author for a free PDF, if you couldn't afford it, most, if not all emailed you a free PDF.
Harvard famously said that they couldn't afford Elsevier anymore. [0]
I send my research to Elsevier for free, and surrender all my publishing rights & copyright to Elsevier to be able to publish it.
When/if it's published, I have a paltry "author's copy" in return, which I have to be very diligent while giving copies of it away, otherwise Elsevier might punish me.
At the end, it's a paper which bears my name, but I have none of the rights attached to it, and Elsevier gets literally millions of dollars from each country which licenses its publications.
Their expenses are a mere rounding error for what they charge, and they are doing this to protect their income, not my research.
Copyright infringement / ethical issues in AI is something else:
Crawlers reap & providers sell my data without my consent, and I get nothing in return, except the ability to poorly imitate my writing/art style, making my work, blood, sweat and tears I shed over these years to create that style worthless.
Both parties earn exorbitant amount of money with my work, for free, and suck me dry in the process. One at least gives me a paltry PDF file and maybe some recognition, and the other one threatens my livelihood while raising hype and applauding degeneration of human achievement and reducing it to a mere set of numbers.
Both are cutting the tree they're living on, though.
I am not familiar with the academic publishing world but it seems this should be disruptable. Why isn't there some other outfit running a WordPress site taking submissions and publishing them on much less onerous terms?
It's kind of like saying; 'Shit, why are people paying 6 figures for college degrees in US when they could just learn most of that for free'.
Because no one gives a shit if you don't have the expensive piece of parchment.
Academic publishing is similar. The impact factor and 'prestige' of the journal matters to your University, your peers, your grants panel, and yourself. However this results in 2 scenarios, when you publish in a top tier journal, a) the journal charges you a small fee and also pay-walls your work so your reach is lessened, then actively polices you sharing YOUR work without permission b) the journal charges an exorbitant fee (Nature wants $11,000 USD) for open-access publishing that allows wider distribution (but still has stipulations in some cases).
HOWEVER, some editorial boards of big for-profit journals have flipped the table and started their own not-for-profit journals with blackjack and hookers. The big one in my field was NeuroImage board creating Imaging Neuroscience - with a public letter to the owners (Elsevier if I'm not mistaken) calling out the bullshit publishing fees.
There is a reason no one gives a shit. And it has nothing to do with publishers. If a paper actually contributed meaningfully to a field it everyone would know about it.
Ironically, institutions like Elsevier justify the existence of the numerous hack academics (not scientists) that exist nowadays. Most of whom have no leg to stand on complaining about Elsevier's rent seeking when they themselves would be infinitely more useful flipping burgers.
This is already disrupted in AI at the highest stages. arXiv paper are the first class citizens there; people regularly cite blog posts, and even tweets in their papers. Rather than journals, people take conferences more seriously.
Now, some companies like DeepMind like to publish in Nature for prestige's sake. That's a different thing.
The disruption started even before the AI hype. ArXiV is not an AI focused service anyway (it started with pysics IIRC). It's FAIR and Open science and push from countries like Germany which forced Elsevier to sign open access submission and publication agreements in the first place.
This happened ~5 years before AI hype became something, and ArXiV was a force even before that.
Uh-- no one is forcing you to send anything to Elsevier. Or any other publisher.
If you don't like the terms, self-publish. The Internet makes it easier than ever.
The reason so many people use Elsevier is because they realize it's a better deal than self-publishing.
Copyright is a right that's given to YOU, the creator. If you don't want to sign it away, find your own way to distribute your work. Copyright will protect YOU from big companies stealing your hard work.
The name of the game is Peer Review, which is a fundamental pillar of science. Your blog or self-published papers are not peer reviewed.
If you found and can sustain an open access journal with a reputable peer review process, you can do a lot of business. If you're not get subverted by big houses, of course.
It's not the AI research which disrupts old publishing houses. It's FAIR and Open Science.
The biggest reason publishing houses still continue is not that they hold PDFs, but they provide peer reviews as a service. It's what it provides their prestige and inertia.
The bad thing is many open access journals keep the bar pretty low, allowing Elsevier, Springer, et. al to amplify their power. Moreover, if you are not aware, every big publishing house loves to allow ArXiV submissions even in intermediate revisions because they reduce the editorial load on themselves while raising the quality bar.
You can already cite blog posts, etc. in your publications. It's not something frown upon as long as what you cite is sound.
At the end of the day, FAIR & Open research, institutional federated and peer reviewed data warehouses and (high quality) open access publications will kill these big houses, and they already bent pretty hard with forced open access subscriptions and submission agreements done by countries.
Disclaimer: My institution also manages these subscriptions for universities country-wide.
it's not funny when you compare "profits over other's knowledge" to "share knowledge without profits", it doesn't make sense at all. potatoes and oranges
This is dictionary definition of "parasite".
Nobody funds most artists. Buying artwork isn't funding. They produce art from their own funding. Then a company leeches them off and trains models without compensating them.
I am okay at training models on arXiv papers. The authors consented to spread the knowledge publicly.
With such dumb-logic comments, you make it hard to take your point-of-view seriously.
Edit: The copyright of the paper _authors_ isn't being protected. They are being blood-sucked. And, before the Hub, if you emailed an author for a free PDF, if you couldn't afford it, most, if not all emailed you a free PDF.
Harvard famously said that they couldn't afford Elsevier anymore. [0]
[0]: https://www.theguardian.com/science/2012/apr/24/harvard-univ...