| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rndn 4032 days ago
	There should be a contest: Who can find the most implausible data storage medium? (Rated according to various criteria such as ingenuity, reliability, max. data read/write rates, latency, storage size, costs…)

18 comments

elwell 4032 days ago

Convert data to binary. Use Amazon Mechanical Turk API to create tasks for people to remember the index of each bit (the value of the task would be $0.01 for binary 0 and $0.02 for binary 1). And, for reading memory, a new task to input the index they remembered and the value they were paid.

pimlottc 4032 days ago

You'd have to factor in a ton of redundancy to account for the human bits who just got bored and wandered off.

Anyway, people would probably just start saving the bits on this computers after first job or two. Which would be an amusing result for being just a convoluted interface to a remote hard drive, but it's conceptually less interesting then actually using distributed human memory as a digital storage medium...

xanderjanz 4032 days ago

you could structure it such that the longer they sit there remembering the data, the more they get paid. When they want to leave, they enter what they remember and get paid.

oconnore 4032 days ago

http://bash.org/?98

fatratchet 4032 days ago

To get reliable and free storage, photo hosting is usually the easiest way. Flickr offers 1TB, picasa/g+ offers unlimited storage with some hidden quoats. Everything that allows lossless photos lets you store arbitrary data. Depending on how careful you wanna be you can store hundreds of GBs per account.

Email attachments used to be a great way a while ago but nowadays using multiple gdrive/dropbox/onedrive accounts is much easier.

They are easy to create in large numbers (especially if your ISP has dynamic IPS) and as long as you're even a little bit careful, nearly impossible to ban. Add some redundancy across different services to that and a $2 VPS that gives you tons of upload bandwidth and you've got yourself as many TBs of free,fast and reliable online storage as you want.

I spent so much time as a teenager with no money and some python skills coding storage solutions like that. I'd say it was to store movies and tv shows for myself but in retrospect I mostly did it because it was so much fun to develop.

userbinator 4032 days ago

Video hosting (i.e. YouTube) is another potential repository for massive amounts of data.

Combine that with the fact that data which is encrypted looks practically like static, and you could potentially overlay it on top of an existing video of something mundane.

You'd need to use strong ECC to get past the lossy encoding, but as things like QR codes show, that is not so hard.

The audio channel is also usable...

r-w 4032 days ago

Trying to get the greatest entropy possible through arbitrary-strength JPEG compression would be an interesting problem to solve.

darkstar999 4032 days ago

The new Google Photos storage is lossy unless you pay for it. That doesn't rule out using the images in a different way though.

huckyaus 4032 days ago

I thought it was only lossy if the originals you uploaded were >16MP. I tested uploading some <16MP images and redownloading them, and they didn't seem to have undergone any lossy conversion.

elinchrome 4032 days ago

Did you compare a hash of the file, or did they just look the same?

conductr 4031 days ago

I've done similar with images for the fun of it. The simplest solution that I recall finding was to base64 the file/data, then turn to hex, then use those hex data to create pixels in RGB. I would line them up top-left to bottom-right.

Probably not the most efficient but easy and fast and the resulting images would look... interesting. For large files, the decoding would be difficult mostly just due to reading the image of so many pixels into memory. So, that's when I began fixing the image size to a smaller size and having multiple images that I would later convert to 60fps video. I could then use ffmpeg to convert images to frames and frames back to images.

I had no practical use for this but, was a fun project on a rainy afternoon.

namwen 4032 days ago

Yeah, I wrote something that stores data to Flickr last summer: https://github.com/namwen/hoardr . I kind of had a reason but it was more for the enjoyment of getting it to work.

adrian_blx 4031 days ago

There is also hyperglobalmegastore https://github.com/adrian-bl/hyperglobalmegastore All data is encrypted and you can even mount your flickr 'drive' using fuse.

vivab0rg 4031 days ago

This project needs more stars!

rogeryu 4031 days ago

Until this gets so popular that Google or Flickr start to analyse photos, and come to the conclusion to either delete those photos and videos, or to convert them and destroying the data for you. Then, years later, you need your backup and ....

mayli 4032 days ago

I did the same thing for google photos, but just for test purpose. https://photos.google.com/album/AF1QipOjZrywipm-SSH9jVNsKVF5...

rakoo 4031 days ago

The front-end is already there : https://tahoe-lafs.org/trac/tahoe-lafs. Backends are currently being developed (https://github.com/mk-fg/tahoe-lafs-public-clouds), and there will even be a public offering from the very same guys (https://leastauthority.com/)

empyrical 4032 days ago

My personal favourite:

https://github.com/philipl/pifs

Hortinstein 4032 days ago

its like the dust theory in Permutation City...

", he became convinced of something he came to call the Dust Theory, which holds that there is no difference, even in principle, between physics and mathematics, and that all mathematically possible structures exist, among them our physics and therefore our spacetime. These structures are being computed, in the manner of a program on a universal Turing machine, using something Durham refers to as "dust" which is a generic, vague term describing anything which can be interpreted to represent information; and therefore, that the only thing that matters is that a mathematical structure be self-consistent and, as such, computable. As long as a mathematical structure is possibly computable, then it is being computed on some dust, though it does not matter what dust actually is, only that there be a possible interpretation where such a computation is taking place somehow. The dust theory implies, as such, that all possible universes exist and are equally real, emerging spontaneously from their own mathematical self-consistency."

Great book!

ngoldbaum 4032 days ago

Also massive spoilers for the end of Contact. The novel, not the movie.

tacone 4032 days ago

That just segfaulted my brain. Everything we may ever write in the future is already there, you just need the address.

zedadex 4032 days ago

> Copyright infringement? It's just a few digits of π! They were always there!

You really have to admire that creativity

Lawtonfogle 4031 days ago

While I'm not sure every number is in pi (see my other comment to grandparent), there is a similar really weird feeling I get when I consider all digital data is really just numbers. That means there is a number, that when turned into a .avi (or format of your choice), shows anything you can imagine. Imagine yourself talking with Plato. There is a number that produces a 1080p video of you doing just that. Actually, there are a lot of numbers that do that, as every little difference in the setting would be a different number.

There is a number that produces a high def photo of when you married your high school sweetheart, even if you never actually married her. There is one of you being awarded the Nobel prize. If there is a proof that P = NP, or that it doesn't, or even a proof that it can't be proven either way, then there is a number that would be the PDF version of that document.

jordigh 4032 days ago

The problem is that the address is typically larger than the actual data you want to store.

mafuyu 4032 days ago

Luckily, I know of a scheme to compress the address 100%! ;)

tacone 4032 days ago

Oh no! I sketched up a script to gzip the chunks, hashsum them, and then find out how many collisions there are before the real occurrence starting from an approximate address in the PI digits chain, so that I could have: ($address*1e12)$hash$collisioncount

The resulting string is 10% of size of the gzipped string, at the expense of CPU. But when I read you achieved 100% compression I just deleted the script and got out to get a beer. :-(((

bryogenic 4032 days ago

Up next, PiCoin: proof of work is finding the index of the goal data in pi.

Lawtonfogle 4031 days ago

Is this actually proven? Pi is irrational, but is it proven to be random (or normal)?

http://www.askamathematician.com/2009/11/since-pi-is-infinit...

Also, assuming that it is, if 'start as position X and read Y bits from pi' produced an illegal image (top secret document, abuse images, etc), what would be the legality of trading such information?

dbarlett 4032 days ago

Packet juggling http://lcamtuf.coredump.cx/juggling_with_packets.txt

andrewstuart2 4032 days ago

Packet juggling over RFC1149-compliant networks.

https://www.ietf.org/rfc/rfc1149.txt

daveloyall 4032 days ago

When you say that, I picture this.

https://duckduckgo.com/?q=starlings+swarm&iax=1&ia=images

hhm 4032 days ago

Steganography is always interesting for data storage. It is pretty easy to hide data into pretty much any medium.

See http://jthuraisamy.github.io/markovTextStego.js/ and https://github.com/hmoraldo/markovTextStego

chrissnell 4032 days ago

Combining steganography with Reddit could be interesting. Random (mildly interesting) photos pushed to imgur and posted to /r/pics by the same user every time.

Zikes 4032 days ago

A stenographed image embedded in a Word document, printed and faxed to a document archive that scans and digitizes it, embeds the scan in a PDF, and emails it back to you.

iblaine 4032 days ago

Pretty sure this happens in Washington DC when bills need to be reviewed by various departments.

Lawtonfogle 4031 days ago

The original image is a picture of a worker's monitor displaying some error message that IT asked for.

I'm not joking either.

Cacti 4032 days ago

During this process you will lose data. A lot of data.

baddox 4032 days ago

Can you make that fully automated from the end user's perspective?

prawn 4032 days ago

Hey, client of mine, you need to pay my invoice. Also, that photo you sent me won't open.

notacoward 4032 days ago

Erasure-coded comments distributed across the huge number of abandoned Wordpress blogs and phpBB forums that are out there. Plenty of storage, pretty readily accessible, low probability that even one fragment will get deleted, and even if one does that's what the erasure coding is for.

EDIT: also, Wikipedia never deletes anything. Even if your "edits" get reverted, you can still find them via the history page. Hmmm.

Hello71 4032 days ago

no, deleted media is gone forever IIRC.

deleted pages are not visible to people with less than sysop rights (on enwp), and multiple methods are always available to deal with troublesome people, ranging from revision deletion to blocks and eventually ISP contact.

joliv 4032 days ago

Wikipedia is a bit more vigilant with banning than abandoned blogs are :)

cmdrfred 4032 days ago

A single user storing a reasonable amount of data though might get away with it... I know what I'm doing this weekend.

PostOnce 4029 days ago

Abusing one of the most important, non-profit resources on the internet?

Just because we can doesn't mean we ought to.

vidarh 4032 days ago

Usenet messages and mail systems are both good old ideas (I don't know of any actual implementation, but it's certainly been discussed at least back to the early 90's).

For Usenet you could depend on widespread resilien distribution + reasonably long retention periods for a lot of groups (but risked having messages killed by admins if too obvious spam).

For e-mail, anything reflecting your e-mail back can be used to juggle data: Send messages with attachment, refuse to accept the inbound reflected messages for a couple of days to let the other party store the data for you while they retry, then accept the message and instantly send it back out again.

Then there's the old Linus Torvalds quote:

"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it."

fatratchet 4032 days ago

Usenet is perfect for that since binary newsgroups for piracy have gotten really popular over the last few years. You can basically use it as a reasonably reliable key-value store that lets you store 300kb to 1mb blobs. Add some encryption and parity and you've got yourself nearly unlimited storage, even for free if you use trial accounts from certain providers.

0x0 4032 days ago

Yeah, I remember reading about the e-mail reflection idea in the book "Silence on the wire", authored by "lcamtuf", the guy who's more recently known for writing afl-fuzz.

alfg 4032 days ago

Something not too far off that I made a couple of years ago for fun. Stores small snippets of data in the URL.

https://github.com/alfg/jot with demo.

rcthompson 4032 days ago

So, with this plus a URL-shortener as a frontend, you're essentially using the URL shortening service as the data storage.

alfg 4032 days ago

Ha, right! Especially since the URLs can get very lengthy depending on the message.

_lce0 4032 days ago

genius!! ready for t.co and bit.ly

perfectly for small pieces of immutable data!

SilasX 4032 days ago

How about just a project that implements an S3-style directory system, with a "fill in the blank" for you to implement the storage backend?

That is, for a given storage medium, all you have to do is implement methods for "write key-value pair" and "read value at key", and you get to piggyback off that medium for your storage.

ryan-c 4032 days ago

Here's my entry.

https://github.com/ryancdotorg/dnsstore

mmahemoff 4032 days ago

A similar service is here: http://www.cambus.net/interesting-dns-hacks/

Interesting about DNS stores is they save a round trip, so it's not just a weird abuse of the protocol to store content, it's also potentially a performance optimisation.

silverwind 4032 days ago

And almost every provider around the world provides a free 'CDN'. It's win-win!

mmahemoff 4032 days ago

Half-CDN

Half-DNS

I call it the CDNS :)

jackgavigan 4031 days ago

nslookup -type=TXT jackgavigan.com

;-)

an_account_name 4031 days ago

Licensed under the WTFPL! One of my all-time favorites.

antihero 4032 days ago

Connect ethernet cables in a loop, keep sending data back and forth "around" the loop. Data is stored in cables.

I think this was from an old BOFH.

voltagex_ 4032 days ago

http://webcache.googleusercontent.com/search?q=cache:786NsZY... and https://github.com/yarrick/pingfs

OT: Let archive.org save your pages, people!

nosuchthing 4032 days ago

Apollo 11 used Rope Memory on its voyage to the moon. [1] [2]

[1] http://news.bbc.co.uk/2/hi/technology/8148730.stm

[2] http://en.wikipedia.org/wiki/Core_rope_memory

joezydeco 4032 days ago

Mercury delay lines always fascinated me:

http://en.wikipedia.org/wiki/Delay_line_memory#Mercury_delay...

rrrrob 4031 days ago

I'd love to exploit ad networks user profiles for this. I.e., store some bits as "interests", by running a few appropriate google searches or hitting a few web sites, read the bits by seeing what ads you're served. This would probably require a bit of learning and a redundant encoding to make it work, but...

haylem 4031 days ago

I had a few in mind when I was back in uni and hosting and cloud storage prices were still up.

I hadn't thought of reddit, as the abuse would be clearly visible, but I had used back then that Gmail Drive some guy had implemented using emails for storage, and it led me to think a lot of the Google Systems had non-obvious "unlimited" storage options.

For instance, I don't know if that's still the case, but Google Calendar surely seemed pretty fit for abuse: while calendar entries were limited in size, you could have as many as you wanted. And calendars can be private, so it's even better.

The problem with such systems will be the integrity of your data, when you start being forced to chunk things up. If they change one thing under your feet, you're a bit screwed. Also you have to detect all the undocumented pitfalls (e.g. forbidden characters in an edit field).

mmahemoff 4032 days ago

Furl - Storing data in URL shorteners and aptly refers to itself as "parasitic storage". Some precursors referenced on its homepage.

https://code.google.com/p/furl/

leni536 4031 days ago

I like this one: https://github.com/yarrick/pingfs

baddox 4032 days ago

The Bitcoin blockchain works fine, but is fairly implausible for interesting amounts of data.

Mithaldu 4032 days ago

> most implausible data storage medium

That still works well!

It's easy to make something bizarre and unusable. Have it bizarre and surprisingly usable. :D

Zikes 4032 days ago

Darn, that rules mine out, then.

Fax is about as unusable as you can get...