Hacker News new | ask | show | jobs
by kfk 3574 days ago
Can somebody explain me how we deal with the increasing size of a blockchain? I get moore's law etc., but other than that? I mean, it's trillions of transactions we are talking about. Federated servers? We break the chain in some way?
7 comments

Any blockchain client for any chain (Bitcoin, Ethereum, Dogecoin, ...) could very easily implement a storage layer for the underlying blockchain data using a protocol like IPFS.

https://ipfs.io/

In this model, individuals would not need to store the entire chain as they could lazily fetch parts of the chain as they are needed. This would allow individuals to store very little of the historical blockchain data if their use case doesn't require them to access that historical data.

There are nuances to this solution. IPFS can be thought of as one giant torrent, and with torrents, someone must have the part you need for you to be able to fetch it. There is a theoretical failure mode in this model where everyone happens to delete the same part of the blockchain thinking that they won't need it and if they do they'll get it from someone else. In this case, this portion of the chain history would be lost. This failure mode should be simple to mitigate, especially since many people participating in the network need access to significant portions of the historical data, and everyone won't use the IPFS based storage layer so, when a chunk is not available over IPFS, the client can just fetch it over normal means.

Don't forget that like BitTorrent, IPFS uses a DHT to route requests. This makes it vulnerable to DHT routing attacks, where an attacker can Sybil the system and censor all of the routes to a block.
How would you ensure the validity of the part of the blockchain you are reading from IPFS?
You may be aware, since you're asking the question, but you really can't without some other information.

If you have block D, and want to fetch block A, you don't actually know if it's the real block A unless you also have blocks B, and C. Block D (which you have) contains the hash of block C, which contains the hash of block B, which contains the hash of block A. You need the hash of block A to be able to verify that it is indeed the real block A.

Of course, because of things like SSL/TLS, you can be sure nobody tampered with the block on it's way from the server to you. With that in mind, ensuring you receive the real block just requires you to trust the server giving you the blocks, which may or may not be worth it depending on how much time/space it saves you. In some ways, the server would become your 'central authority' on the block-chain.

In reality though, I can't really imagine there's much they could do. Sending you fake blocks may work for a while but would fail if the client caches them and asks for the surrounding blocks (Their fake block isn't going to match the hash of the real block). I think it would be a bit hard to pull off for any length of time.

> unless you also have blocks B, and C.

It suffices to have the headers of blocks B and C. Which, at only 80 bytes per header, is quite manageable.

Are you sure? The header for C will tell you the hash of B. How will you verify the header given for B if you cannot hash the entire block and compare it to the hash you got in the header of C?
Each block's header contains a hash that is the root of a Merkle tree [0] of the transactions in the block. The Merkle-root hash effectively summarizes all of the block's transactions, which allows the overall block hash to be a hash just of the constituents of the header.

Thus if you have the header, you do indeed have everything you need to produce a hash and verify that it matches the hash referenced in the succeeding block. You do not need the information describing individual transactions.

0. https://en.m.wikipedia.org/wiki/Merkle_tree

This assumes you already know Block D is genuine. You can only know this if you have already verified all the previous blocks. You can prune them away afterwards, but you still must download and run computation over them once.
Not an expert about blockchains, but this makes me wonder whether something close to skip lists <https://en.wikipedia.org/wiki/Skip_list> couldn't in principle be implemented in blockchains to avoid this difficulty.

Imagine that block n, when produced, in addition to the hash of the previous block n-1, gave the hash of block n-2, n-4, n-8, ..., n-2^n. This implies that the block size grows linearly, but it should give you the following: whenever you request and obtain a past block A and you have the current block D, you can use these pointers starting at D to easily request and obtain a sequence of blocks (of length log n) which allows you to authenticate A from D. (Of course the algorithm to reach from A is simply to request the first block authenticated in the header of D which is after A.)

Bitcoin already has a p2p layer like torrents or IPFS that can quickly download the block chain from many peers. On a sufficiently fast internet connection you get limited by CPU speed verifying signatures.
Is IPFS a workable solution, since storage and bandwidth still have real costs to participants? It seems to me it's prone to abuse (botnets storing large amounts of crap on it)
The best theory I could found is that everybody just starts trusting an old enough snapshot of the chain, and its end becomes the chain's beginning. Then everybody just throws the older transactions away.

I can imagine this working in a low volume chain. I do really doubt it would ever work on current Bitcoin chain or anything bigger.

It seems reasonable even on a massive chain. Some people do need to keep the whole archived part of the chain, and anyone who wants to verify it can do so, while day-to-day transactions can just compress that part of the chain down to a hash.

You'd need to keep enough of the chain live for all "live" transactions; that would likely have a time bound (days, perhaps). People who run a "full" client that verifies the complete blockchain then just need to have enough storage for the complete transaction volume over that time period. (And people who only need to worry about their own transactions and don't care about verifying other people's transactions can hold onto even less data.)

I'm a blockchain noob, but what about people like me that own some bitcoin, but don't touch them for a decade. I imagine my ownership of said coins would be in that part that is truncated/archived/hashed. So how would I then use them?
Someone more familiar with this than me feel free to correct me here, but this is my basic understanding:

If you think of your bitcoins as following a path from address to address, the pruning process would retain the final node or two on that path but discard any previous nodes. So your 10 year old bitcoins at address X are still the most recent node on that chain and are kept around. However, once they are finally sent to another address the bits recording your current address can be forgotten.

In Bitcoin, these final nodes are called UTXOs (unspent transaction outputs), and indeed if you use chain pruning or a light client, they are all you need. They also need to be searched quickly to verify transactions, so they are usually kept in RAM. Because they cannot be pruned, there is a soft fork being developed called SegWit that changes the accounting so that transactions cost more if they create many unspent outputs, or cost less if they spend more outputs than they create.
Yep. If people are familiar with Redis, it would be much like how Redis periodically rewrites its append-only-file to make it smaller. So you go from a file with full history like

    SET a 2
    SET b 4
    SET a 6
    SET b 10
    SET a 20
to just

    SET a 20
    SET b 10
The snapshot would just retain resulting balances at a certain block height, and no history of how they got there (previous transactions). If people wanted that history it could obviously be archived and accessed separately.
Would be interested in your feedback on Verifiable Maps as implemented at https://www.continusec.com/ which can give verifiable answers to specific questions such as what is the current balance as well as a full history log of all mutations to the map.
This is probably a gap in my knowledge, but I hope this isn't a terribly obvious question.

So let's say I buy a bunch of bitcoins and I just sit on them.

A year or two from now, they're worth a bunch of money, so I use them or sell them. How does everyone with the truncated blockchain know that the bitcoins I'm using aren't being double-spent?

The truncated blockchain would contain at least the latest amounts for any wallets. Truncation loses the old transactions, but cannot exclude any wallets that have any BC in them.
The Bitcoin whitepaper [1] (which is very short, concise, and definitely worth the read) describes how the Merkle trees might be pruned (transactions could be pruned when all outputs are spent) and in fact Bitcoin 0.11 added pruning (and 0.12 added further compatibility for pruned nodes).

Vitalik Buterin (Ethereum lead) also posted a description of how Ethereum will implement State Tree Pruning [2].

These are short/medium-term scaling solutions.

Many people are pushing sidechain [3] / child-chains [4] as longer-term ("Visa" level) solutions.

* Lightning Network [5][6] is what Blockstream and others are pushing for Bitcoin scaling.

* Raiden Network [7] is the Ethereum version, and may come out before Lightning, funnily enough.

For those interested in the topic matter, Vitalik wrote a pretty dense paper on blockchain scaling last year [8].

While sidechains will be an option, Ethereum is also pursuing on-chain scaling as well.

Sharding is scheduled for the upcoming Serenity release [9] (which also switches to Proof of Stake (Casper), another important scaling-related change). Writeup on Serenity PoC2 [10].

In general for scaling atm, processing/network latency/bandwidth for initial sync is more of a bottleneck than raw storage. By that measure, I'm actually a lot more interested in the on-chain tx-processing that Casper/PoS might bring [11].

Of course there's no requirement for a single chain to handle everything. Especially over the past few months, we've seen a few relatively smooth migrations to multiple chains as Bitcoin tx processing has "maxed out" (or more recently as fungibility/privacy issues have loomed). At the same time, Bitcoin volatility has stabilized, so it might be just the natural life cycle for blockchains to "scale" until they can't. It's still early days, so who knows?

[1] https://bitcoin.org/en/bitcoin-paper

[2] https://blog.ethereum.org/2015/06/26/state-tree-pruning/

[3] https://gendal.me/2014/10/26/a-simple-explanation-of-bitcoin...

[4] https://www.ardorplatform.org/

[5] https://lightning.network/

[6] https://lightning.network/lightning-network-paper.pdf

[7] http://raiden.network

[8] https://github.com/vbuterin/scalability_paper/blob/master/sc...

[9] https://github.com/ethereum/EIPs/issues/53

[10] https://blog.ethereum.org/2016/03/05/serenity-poc2/) from March

[11] https://twitter.com/VitalikButerin/status/760185856057638913

(phew, maybe one day HN will have markdown support, maybe one day...)

In this article's case, the "blockchain" isn't duplicated by everyone. It's just duplicated among parties in the need to know.
Most of the chain can be dropped/collapsed. All you need to keep are the final hashes. The white paper discusses this.
> I mean, it's trillions of transactions we are talking about.

It is not even close to that yet. Eventually yes. Right now the bitcoin blockchain is about 80GB.

What real numbers are you using to predict this being a problem?

BTW a 120TB SSD meant for read heavy workloads was announced not long ago. Huge, fast random access and made for reads is exactly what you would want.

Beyond that my own prediction is that there will be many successful crypto-currencies. Most people won't store the blockchain of any of them, but whoever wants to be more involved could pick and choose.

This is a problem Ethereum is currently working on. With the upcoming switch to PoS, methods of sharding the blockchain are being experimented with.