Somehow i am wondering if the 1900 spike is just a flaw in the data model. Does he explain why so many items are labeled as 1900?
Could it be that for items with an unknown date in the large collection the data field only contains "'00" or can be itnerpreted as 1900 or someone decided that the default value for an items creation date is 1900 and not all items have a date set?
>Such anomalies can stand for series of items in the collection, they can be traces of curatorial decisions in cataloguing, they could be mistakes in dating, etc.
I'd call those "curatorial decisions". Makes sense here in the art world, but large datasets which defaulted unknown dates to January 1st bug the hell out of me.
Could it be that for items with an unknown date in the large collection the data field only contains "'00" or can be itnerpreted as 1900 or someone decided that the default value for an items creation date is 1900 and not all items have a date set?
Edit: I think the mystery is solved: Looking at https://raw.githubusercontent.com/MuseumofModernArt/collecti... you will find a lot of data fields like "c. 1900". For example the data field for this item says "c. 1900": http://www.moma.org/collection/works/60868?locale=en
I first assumed "c. 1900" means "created 1900" but it's more likely "circa 1900". Also, this particular piece was created 1910: http://www.davidrumsey.com/amica/amico866157-125545.html
So, i think the collection contains a lot of items with unknown creation date which are labeled "circa 1900".