|
|
|
|
|
by airstrike
2497 days ago
|
|
I love this. Reminded me of one time some 15+ years ago I did some back-of-the-envelope math like yours but on people using colored characters on every message on IRC w/ scripts and how much meaningless data that generated... it's astonishing how quickly you can get to the PB scale |
|
So lets assume that amazons cia/nsa op datacenter is receiving these 1PB a day messages.
Lets assume they use a tool to remove certain very common words and an algorithm whic can replace the hidden words later if they want to restore the messages.
Text compresses really well.
And if you have a replacement dictionary, even better.
The most commonly used words:
https://en.m.wikipedia.org/wiki/Most_common_words_in_English
So you have a simple key-value store....
And compress that text even further.
So what can we get a PB down to if we use that thinking...
—-
Hey has anyone noticed that palantir Has stopped their employees from wearing their swag on bart and disappeared from reddit and HN?
Oh and all the nlp competitors went silent??
Yeah, deep state data processing market is booming.