Hacker News new | ask | show | jobs
by fiddlerwoaroof 887 days ago
It’s at least 20ish years ago: I remember an old sysadmin talking about managing petabytes before 2003
3 comments

Must be much more than 20ish years, some 2400 ft reels in the 60s stored a few megabytes, you only need 100 000s of those to reach a terabyte. https://en.wikipedia.org/wiki/IBM_7330

> a single 2400-foot tape could store the equivalent of some 50,000 punched cards (about 4,000,000 six-bit bytes).

In 1964 with the introduction of System/360 you are going a magnitude higher https://www.core77.com/posts/108573/A-Storage-Cabinet-Based-...

> It could store a maximum of 45MB on 2,400 feet

At this point you only need a few ten thousand reels in existence to reach a terabyte. So I strongly suspect the "terabyte point" was some time in the 1960s.

Those numbers seem reasonable in that context. I first started using BitTorrent around that time as well, and it wasn't uncommon to see many users long-term seeding multiple hundreds of gigabytes of Linux ISOs alone.

Here’s another usage scenario with data usage numbers I found a while back.

> A 2004 paper published in ACM Transactions on Programming Languages and Systems shows how Hancock code can sift calling card records, long distance calls, IP addresses and internet traffic dumps, and even track the physical movements of mobile phone customers as their signal moves from cell site to cell site.

> With Hancock, "analysts could store sufficiently precise information to enable new applications previously thought to be infeasible," the program authors wrote. AT&T uses Hancock code to sift 9 GB of telephone traffic data a night, according to the paper.

https://web.archive.org/web/20200309221602/https://www.wired...

Yeah, at the other end of the scale, it sounds like Apple is now managing exabytes: https://read.engineerscodex.com/p/how-apple-built-icloud-to-...

This is pretty mind-boggling to me.

I archived Hancock here over a decade ago, stumbled upon it via HN at the time if I’m not mistaken: https://github.com/mqudsi/hancock
That’s pretty cool. I remember someone on that repo from while back and was surprised to see their name pop up again. Thanks for archiving this!

Corinna Cortes et al wrote the paper(s) on Hancock and also the Communities of Interest paper referenced in the Wired article I linked to. She’s apparently a pretty big deal and went on to work at Google after her prestigious work at AT&T.

Hancock: A Language for Extracting Signatures from Data

https://scholar.google.com/citations?view_op=view_citation&h...

Hancock: A Language for Analyzing Transactional Data Streams

https://scholar.google.com/citations?view_op=view_citation&h...

Communities of Interest

https://scholar.google.com/citations?view_op=view_citation&h...

I raised this to retro se and https://retrocomputing.stackexchange.com/a/28322/3722 notes a TiB of digital data likely was reached in the 1930s with punch cards.