Hacker News new | ask | show | jobs
by phire 1628 days ago
The fact that all the data is there is kind of irrelevant if you can't query it.
1 comments

Why would you want to query it, though?

A full node lets you fully verify the chain's historical states and it lets you interact with the current state. Unless you're running a service that exists solely to allow people to query historical states (like a block explorer service), I don't see why it would be useful to be able to query historical state.

You need an archival node to see a list of all transaction that transfer eth into an address.

A full node can only give you the current balance, and a list of all transactions that directly transfer eth to that address. Any transaction that transfers eth as the side effect of a smart contract is invisible.

I personally see it as a flaw in the design of eth. You shouldn't need the complete history of states just to find all relevant transactions, but you do.

Besides, the argument that regular users shouldn't need to query such information it doesn't change the fact that the information is unqueriable in a full node, short of spending 28 days transforming it into an archival node.

I'll give you that. If you need to query a list of all contract transactions that have ever transferred ETH to your address, I believe you would need an archive node to do so although don't quote me on that.

> Besides, the argument that regular users shouldn't need to query such information it doesn't change the fact that the information is unqueriable in a full node, short of spending 28 days transforming it into an archival node.

If you don't need to query the data, then the data doesn't have to be unpacked and indexed for querying. Seems simple to me.

It's kind of misleading to claim the archival is packed. It's not compressed into some archival format. Instead, the full node contains all the inputs to regenerate the data.

To transform into an archival node, a full node has to rewind to the very first block, and replay every single transaction.

Since the EVM is Turing complete, this is roughly equilvent to stimulating a computer with years of recorded keyboard and mouse inputs, taking care to record how each input effects state of the computer.

You can't jump to the middle, you have to replay the whole thing.

I don't think it's misleading to call Git history "packed", and the mechanism for regenerating historical states is similar to Ethereum's (though of course Git's delta function is changeset-only with no turing-completeness). In fact, Git calls its own delta-storage "Git packfiles".

The EVM is a very simple and rudimentary virtual computer, so replaying the whole thing isn't an impossible task. According to the tweet, it took this guy's computer 28 days to replay 4 years of history.

Git also adds snapshots to the mix, which makes it possible to rapidly jump to fixed points in history and only use deltas for the fine grained seek. Git also has indexes to find stuff.

Git justifies the viability of it's "packing scheme" by actually making everyday use of it.

A full eth node has no snapshots or useful indexes into the archival data. It has to apply the deltas linearly from the beginning. Applying the deltas is very slow, very IO bound, seeking all over the disk.

The data might be there, but it's practically useless. A user who discovers they need some archival data is never going to consider waiting weeks for the nearly 7 years of history to be replayed before running their query. Instead they will head over to etherscan and trust whatever it says.