Hacker News new | ask | show | jobs
by elehack 2337 days ago
There's a lot to critique in publishing and associated costs, but this tweet is unfortunately factually wrong.

From the linked article, ACM's publication costs are $10.9M, not $33.7M.

One of the ACM's major publication initiatives over the last 3-5 years has been an overhaul of their publication templates and publication workflow, to ensure greater consistency in publication formatting, improve accessibility, and archive publications in more future-proof formats. There are also the ongoing costs of creating and indexing metadata (ACM tracks more metadata than arXiv, including resolved citations), preservation (ACM buys failsafe perpetual access services from Portico, arXiv has mirrors at other university libraries).

Should it cost $10.9M? I am not sure. Does it cost a lot more than what arXiv does? Yes.

For a costing exercise: the service ACM buys from Portico is archival and republication. If ACM goes insolvent, Portico flips on their archive and the content remains available. How would you price this service, knowing that when it is actually needed, it's because your customer can no longer pay bills, and you now need to take up their hosting (and all related costs) for approximately forever with no further revenue? I think a network of university libraries would be a more cost-effective way to provide this service, but it's the kind of thing that people working on publication and archival professionally think about, and that factors into the cost of professional archival-level publication.

(I cannot speak to IEEE.)

3 comments

> their publication templates and publication workflow, to ensure greater consistency in publication formatting, improve accessibility, and archive publications in more future-proof formats

Publication workflow, formatting and accessibility? For every paper I’ve done I just send the ACM a final PDF produced myself from a LaTeX template that hasn’t changed in years. What’s the workflow for taking an already final PDF from authors and uploading it to a file server?

That workflow has changed in the last few years.

- Brand new templates (introduced about 5 years ago, the LaTeX template has had multiple updates per year since then)

- Workflow that makes use of the source (or possibly codes the source embeds in the PDF, but you have to provide LaTeX source to ACM these days)

- Papers now render in both PDF and HTML (and the HTML looks quite good), this started showing up within the last 1-2 years

- Papers are archived in an XML-based format (something called JITS, I do not know details) to facilitate rendering to PDF, HTML, ePub, and other formats not yet devised

That doesn't seem too impressive. It's essentially a workflow that a few universities could band together and replicate via an open source project relatively easily IMHO.

As an example, Pandoc can already handle 90% of this type of workflow by itself (converting Latex to various XML formats). An open source project shared among a few universities or developed by single body like the ACM and used among dozen's of publications and fields. Even two or three full time people working on this would cost much less than $1M per year.

That sounds pretty counterproductive. So now authors, in addition to keeping up on their research, need to keep up on the updates to the ACM's LaTeX stylesheet? And there's every chance that the version that is formatted well with the ACM stylesheet when you initially submit will have formatting bugs six months later because the template got updated? And now you have a whole new toolchain to debug when the HTML version of your paper misaligns your tables? And maybe the HTML version that looks fine today will get mangled in 2028 after you retire and they update the CSS, as has happened with most of the New York Times articles?

It sounds like the ACM has a really different set of priorities than libraries and researchers do, one that values increasing headcount over guaranteeing permanence.

I'm not sure how it works at ACM, but often, it's people retyping the contents of your article into a JATS-XML template and adding additional metadata (authors, date of publication, perhaps who funded it, etc.), which is then used to generate several outputs (e.g. PDF, HTML, but also citation lists, etc.).
>The Journal Article Tag Suite (JATS) is an XML format used to describe scientific literature published online. It is a technical standard developed by the National Information Standards Organization (NISO) and approved by the American National Standards Institute with the code Z39.96-2012.

https://en.wikipedia.org/wiki/Journal_Article_Tag_Suite

>LaTeXML is a free, public domain software, which converts LaTeX documents to XML, HTML, EPUB, JATS and TEI.

https://en.wikipedia.org/wiki/LaTeXML

The wonderful thing about standards is that there are so many of them. And each one has variations.

> people retyping the contents of your article

Wow. Well I can imagine that’s expensive.

Thank you for the correction.

IEEE's $193m is where we should focus our attention, when it comes to this expense line.

I agree. I have no idea what IEEE is doing that costs that much. And while I don't take as hard a line against them as I do against Elsevier, I have never published with them and don't currently have any plans to change that.
I'm not sure how many articles are published a year in ACM [1], but the answer seems to be a few 10,000s. That's a per-article publishing cost of a few hundred dollars, which is not unrealistic to me.

[1] The ACM Digital Library claims 2.8 million published over 84 years, or about 33,000/year if divided equally over the years (which is laughably false). Some number of that quantity may include citations for keynotes or posters, which aren't really research papers, but I don't have a good handle on that rate.

Annual report 2019 gives some details - 34,000 full text articles were published in the DL. This will exclude non-archival content like keynotes, posters, etc if conference organisers provide correct metadata.