Hacker News new | ask | show | jobs
by dmje 1242 days ago
I run a little agency in the UK who works with museums to help them with digital. A large part of this is getting collections online.

Some years ago we commissioned a developer to make CultureObject[0], a free and open source WordPress plugin to make it easier to ingest collections data for display on the web. At the heart it's a glorified data importer, and many people just use the CSV mode to sync and import collections data.

It requires some dev effort - we've built an add-on which makes this easier but there's no denying that search, faceting and display needs knowledge of wordpress development.

Three years ago we then launched The Museum Platform[1] which is a more SaaS based model - we take away the need for dev skills and ask clients to just send us a CSV and any related media and we do the hard work. It's WordPress again but a modified version where we also facilitate storytelling and narrative around the ingested collections.

The interesting thing about this journey is that the requirement to "get a collection online" is apparently and theoretically easy. But the reality is it gets hard quite quickly as the need for search / filtering appears, and it gets harder still as scale comes into it. 1000 records is fine. 100,000 gets quite a bit harder.

There are also many subtleties - particularly with museum collections. "Location" of a record could be where it was collected, or where it is now, or where it's on display. Relational stuff is hard, as are taxonomies and authority terms. It's hard to generalise and it's hard to scale.

[0] https://cultureobject.co.uk/ [1] https://themuseumplatform.com/

5 comments

I see you decided on Wordpress, if you were going to use a CMS I think Drupal 7 would have been a good choice. Drupal has concept of entities and views. An entity as the name suggests is essentially a table and you can add all sorts of different fields to it. From simple text and number fields to images and fields that lookup other entities thus creating relationships between entities. Views is another construct that lets you choose how to display the entities. As a list of as a table a two possible views. Most of this can be done in Drupal 7 without writing code. I say Drupal 7 because you mentioned Wordpress. Drupal 8 and above is more of a developer framework and requires knowledge of Composer. Backdrop [0] is fork of Drupal 7.

[0]https://backdropcms.org/

WordPress has custom post types, taxonomies and metafields so is very capable of dealing with complex relationships if you need it to. What's challenging is going from simple columnar data such as CSV to something complex and relational.

We chose WordPress because of its ubiquity and power - plus it's insanely easy to host and use as a non technical editor, which (last time I looked) can't be said of Drupal.

This is how ExpressionEngine is structured as well, except they're called channels and templates. I really enjoyed working with EE, although coding is definitely required - you basically have to build your site from the ground up. No themes included.

That being said, I found it much, much easier to develop than WordPress.

This is a really cool niche... and I love the idea of it being more generally applicable or extensible to the kinds of private collections of objects that the writer is describing. (I really like what the article seems to be arguing for).

It seems like the data storage / search / filtering aspects of your software would be really fun and interesting to develop flexible solutions to. The Wordpress aspects probably wouldn't be so fun to maintain, but it's always pick-your-poison when it comes to CMSs unless you develop your own in-house.

That being said, a collection CMS doesn't necessarily need to have all the plugins and doodads that a Wordpress site does. It could be something bare-bones and extensible that was written to be more tightly coupled to a layer that interpreted the underlying data structure. Just toying with the idea, maybe even something that flattened the data views of the collection into static webpages for deployment so that at least some of the indexing could be handled by naming conventions and directory structure without recourse to database searches.

The world could definitely use an open source kit along these lines, with a GUI backend that would let non-developers build their own table structure and search parameters, draw up some page layouts, and just generate a searchable site that collated CSV records with images.

Some of this actually reminds me of what HyperCard could do... it allowed some really interesting experiments with user-classified data. Like this, from 1989: https://core.ac.uk/download/pdf/225955134.pdf

Relational stuff is hard, as you say, but in a structure built around a collection it seems like you could come up with a DSL that defined which columns needed to relate to other tables (any column with repeating data, for instance), suggest making that column "normalized", and automatically generate a linked table.

That's a nice idea - point a script at a CSV file and generate a bunch of flat files for each item using some kind of simple templating language. I might take this back to the guys at the Platform and see if we can do a POC for the clients who have zero budget but want to get going with something straightforward... Thanks for the thinking :-)
Thanks. I was inspired by this article and by your WP implementations of a similar notion. I can't stand WP but I've written a couple CMSs from scratch. As I wrote the idea, it struck me this concept might be something I'd enjoy building as my next side project. If you do decide to create something along those lines as an open platform, maybe I could contribute.
Is this not Libre Office, more or less?
I don't understand. How would you use LibreOffice to build a collection of object photos and descriptions, with custom descriptors and normalized data references, that could be deployed as a searchable / filterable website?
Key value store in Libre Calc with backup storage somewhere for file objects and file references in sheet. Deploy as csv.
What I'm describing, and what the article suggests, isn't about a UI for creating key-value stores that can be saved as CSVs. Any database UI wrapper can do that. The concept is about ingesting CSVs, normalizing the data, and turning them into websites. The parent I was responding to built a system (or two systems, apparently) for doing this as SaaS based around Wordpress.
I'm curious about what you make of Omeka, and whether you think it relates to OP's point. It's quite common in the digital humanities, but I've never seen it used outside that context.
I really like Omeka. It's a very cool project and we did look into it early on. Really though we chose WP because it's ubiquitous - no lock in, very powerful, easy for editors to use. With the right nudging it does all the things Omeka does.
This is really interesting. The problem you're describing sound similar to what I wrote about why this kind of thing is hard to generalize in another comment: https://news.ycombinator.com/item?id=34564394.
maybe I'm being naive, but 100K records doesn't sound hard to search either. maybe at 5 or 10M it starts getting ugly/expensive