Hacker News new | ask | show | jobs
by pipermerriam 3571 days ago
Any blockchain client for any chain (Bitcoin, Ethereum, Dogecoin, ...) could very easily implement a storage layer for the underlying blockchain data using a protocol like IPFS.

https://ipfs.io/

In this model, individuals would not need to store the entire chain as they could lazily fetch parts of the chain as they are needed. This would allow individuals to store very little of the historical blockchain data if their use case doesn't require them to access that historical data.

There are nuances to this solution. IPFS can be thought of as one giant torrent, and with torrents, someone must have the part you need for you to be able to fetch it. There is a theoretical failure mode in this model where everyone happens to delete the same part of the blockchain thinking that they won't need it and if they do they'll get it from someone else. In this case, this portion of the chain history would be lost. This failure mode should be simple to mitigate, especially since many people participating in the network need access to significant portions of the historical data, and everyone won't use the IPFS based storage layer so, when a chunk is not available over IPFS, the client can just fetch it over normal means.

4 comments

Don't forget that like BitTorrent, IPFS uses a DHT to route requests. This makes it vulnerable to DHT routing attacks, where an attacker can Sybil the system and censor all of the routes to a block.
How would you ensure the validity of the part of the blockchain you are reading from IPFS?
You may be aware, since you're asking the question, but you really can't without some other information.

If you have block D, and want to fetch block A, you don't actually know if it's the real block A unless you also have blocks B, and C. Block D (which you have) contains the hash of block C, which contains the hash of block B, which contains the hash of block A. You need the hash of block A to be able to verify that it is indeed the real block A.

Of course, because of things like SSL/TLS, you can be sure nobody tampered with the block on it's way from the server to you. With that in mind, ensuring you receive the real block just requires you to trust the server giving you the blocks, which may or may not be worth it depending on how much time/space it saves you. In some ways, the server would become your 'central authority' on the block-chain.

In reality though, I can't really imagine there's much they could do. Sending you fake blocks may work for a while but would fail if the client caches them and asks for the surrounding blocks (Their fake block isn't going to match the hash of the real block). I think it would be a bit hard to pull off for any length of time.

> unless you also have blocks B, and C.

It suffices to have the headers of blocks B and C. Which, at only 80 bytes per header, is quite manageable.

Are you sure? The header for C will tell you the hash of B. How will you verify the header given for B if you cannot hash the entire block and compare it to the hash you got in the header of C?
Each block's header contains a hash that is the root of a Merkle tree [0] of the transactions in the block. The Merkle-root hash effectively summarizes all of the block's transactions, which allows the overall block hash to be a hash just of the constituents of the header.

Thus if you have the header, you do indeed have everything you need to produce a hash and verify that it matches the hash referenced in the succeeding block. You do not need the information describing individual transactions.

0. https://en.m.wikipedia.org/wiki/Merkle_tree

This assumes you already know Block D is genuine. You can only know this if you have already verified all the previous blocks. You can prune them away afterwards, but you still must download and run computation over them once.
Not an expert about blockchains, but this makes me wonder whether something close to skip lists <https://en.wikipedia.org/wiki/Skip_list> couldn't in principle be implemented in blockchains to avoid this difficulty.

Imagine that block n, when produced, in addition to the hash of the previous block n-1, gave the hash of block n-2, n-4, n-8, ..., n-2^n. This implies that the block size grows linearly, but it should give you the following: whenever you request and obtain a past block A and you have the current block D, you can use these pointers starting at D to easily request and obtain a sequence of blocks (of length log n) which allows you to authenticate A from D. (Of course the algorithm to reach from A is simply to request the first block authenticated in the header of D which is after A.)

Bitcoin already has a p2p layer like torrents or IPFS that can quickly download the block chain from many peers. On a sufficiently fast internet connection you get limited by CPU speed verifying signatures.
Is IPFS a workable solution, since storage and bandwidth still have real costs to participants? It seems to me it's prone to abuse (botnets storing large amounts of crap on it)