| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gmcharlt 1031 days ago

It's not the software per se, which is generally fit for purpose but not amazing, but the traditions and economics underpinning how libraries maintain their bibliographic metadata.

Libraries sharing metadata for their catalogs has a long history, dating back to at least 1902 when the Library of Congress started selling catalog cards for use by other libraries. In the 1960s, the Library of Congress embarked on various projects to computerize their catalog, leading to the creation of the MARC format as a common metadata format for exchanging bibliographic records. (And there is a straight line between how card catalogs were put together and much of how library metadata is conceptualized, although that's been (slowly) changing.)

One problem is that bibliographic metadata from the Library of Congress is mostly generated in-house, and LoC does not catalog everything; not even close. In the late 1960s, OCLC, the organization behind Worldcat, was started to operate a union catalog. The idea is that libraries could download bibliographic records needed for their own catalogs ("copy cataloging") and contribute new records for the unique stuff they cataloged ("original cataloging"). Under the aegis of OCLC as a non-profit organization, it was a pretty good deal for libraries, and over time led to additional services such as brokering interlibrary loan requests. After all, since Worldcat had a good idea of the holdings of libraries in North America (and over time, a good chunk of Europe and other areas), it was straightforward to set up an exchange for ILL requests.

Tie this with a general trend over the past couple decades of libraries decreasing the funding and staffing for maintaining their local catalogs, and need for sharing in the creation and maintenance of library metadata has gotten only more important.

However, OCLC has had a long history of trying to control access and use of the metadata in WorldCat, to the point of earning a general perception in many library quarters of trying to monopolize it. To give a taste, Aaron Swartz tangled with them back in the day. [1] One irony, among many, is that the majority of metadata in Worldcat has its origins in the efforts by publicly-funded libraries and as such shouldn't have been enclosed in the first place. OCLC also has a focus on growing itself, to the point where it does far more than run Worldcat. Its various ventures have earned itself a reputation for charging high prices to libraries, to the point where it can be too expensive for smaller libraries to participate in Worldcat. (Fortunately for them, there are various alternative ways of getting MARC records for free or very cheap, but nobody has a database more comprehensive than Worldcat.)

That said, OCLC does do quite a bit itself to improve the overall quality of Worldcat and to try to push libraries past the 1960s-era MARC format. But one of the ironies of the scraping is that it's not going to be immediately helpful to the libraries who are unable to afford to participate in Worldcat. This is because the scrape didn't (and quite possibly never could have) capture the data in MARC format, which is what most library catalog software uses. While MARC records could be cross-walked from the JSON, they will undoubtedly omit some data elements found in the original MARC.

[1] http://www.aaronsw.com/weblog/oclcreply

2 comments

wayathr0w 1031 days ago

If you liked the comment-length analysis OCLC & want more, there's a whole essay on the subject. [1]

>But one of the ironies of the scraping is that it's not going to be immediately helpful to the libraries who are unable to afford to participate in Worldcat. This is because the scrape didn't (and quite possibly never could have) capture the data in MARC format, which is what most library catalog software uses. While MARC records could be cross-walked from the JSON, they will undoubtedly omit some data elements found in the original MARC.

While it would have been ideal to get all the data in MARC & as many other formats as possible, I wonder how true this is worldwide - many libraries don't use MARC or have a digital catalog at all. Maybe there are some ways the data could be processed that make it easier to integrate into such places, but of course local needs/desires will vary widely.

[1] https://core.ac.uk/download/pdf/11883899.pdf - it was also published in this book: https://archive.org/details/radicalcatalogin0000unse

link

gmcharlt 1031 days ago

> While it would have been ideal to get all the data in MARC & as many other formats as possible, I wonder how true this is worldwide - many libraries don't use MARC or have a digital catalog at all. Maybe there are some ways the data could be processed that make it easier to integrate into such places, but of course local needs/desires will vary widely.

Indeed, MARC is not universal (and for that matter, it wouldn't surprise me if at this point the majority of records in Worldcat were _not_ derived from MARC sources), and there are certainly non-MARC library catalog platforms out there. That said, as the growth of Koha shows, for better or worse MARC has become a close to a global baseline for a lot of libraries.

link

ahi 1031 days ago

Worse, definitely worse.

link

greenie_beans 1031 days ago

> (Fortunately for them, there are various alternative ways of getting MARC records for free or very cheap, but nobody has a database more comprehensive than Worldcat.)

what are some of these sources? isbndb and open library? proquest?

link

gmcharlt 1030 days ago

Many libraries [1], including the likes of the Library of Congress and the National Library of Australia, make their catalogs' MARC records freely available via a library-specific protocol called Z39.50. The Library of Congress makes their catalog metadata available in other ways [2][3] and the Internet Archive has a collection of MARC records as well [4]. There are also a couple commercial services that provide them, and publishers, particularly of digital collections used by libraries, will sometimes supply MARC records (though they tend to be low-quality).

[1] https://irspy.indexdata.com/ [2] https://www.loc.gov/cds/products/marcDist.php [3] https://id.loc.gov/ [4] https://archive.org/details/ol_data

link