|
Here's what's supposed to happen: A student on campus wants to do a research project analyzing a large number of journal articles, or even just the metadata around a large number of journal articles. They approach their librarian, the librarian approaches the journal vendor (JSTOR or someone else), and everyone works together to find a way to get the student their data. Maybe the vendor hands over a special dataset. Maybe they give them back-end access to their database. Maybe they allow the student to run a Python script against their website, but only in the middle of the night so as not to slow down service for other users. Here's what sometimes happens instead: The student writes and runs a clever script, the vendor notices that their servers have slowed due to automated script activity on their webpages, shuts down access from that IP address, and lets the school know. University IT staff and librarians drop what they're doing and try to track down the party responsible. Once they've been identified, the nice librarian has to have a talk with the student about what's permitted under the university's license agreement with the publisher, and together they go to the data vendor to ask forgiveness and permission. They usually get it. Here's what Aaron did: He walked onto the MIT campus, set up a script not to analyze metadata but to actually download large numbers of documents, and when his IP was blocked, he used traditional hacking as well as Johnny Long-style "no-tech hacking" to get around it.
http://video.google.com/videoplay?docid=-2160824376898701015 Publishers, rightly or wrongly, assume that someone systematically downloading entire journal runs is intending to set up a shadow database to give away their content for free. The feds seem to agree, in this case. The thing that gets me is that JSTOR was more than reasonable in this case. They didn't immediately shut down the whole campus (like some overly-aggressive publishers do), but started with the IP addresses involved. When it didn't stop, they had to cut off access completely. Aaron, who isn't even a student at MIT, managed to kick the whole campus off JSTOR for weeks, and as soon as they restored access, he went at it again. If I were a librarian at MIT, I'd want the book thrown at him. |