|
|
|
|
|
by phire
1629 days ago
|
|
Git also adds snapshots to the mix, which makes it possible to rapidly jump to fixed points in history and only use deltas for the fine grained seek. Git also has indexes to find stuff. Git justifies the viability of it's "packing scheme" by actually making everyday use of it. A full eth node has no snapshots or useful indexes into the archival data. It has to apply the deltas linearly from the beginning. Applying the deltas is very slow, very IO bound, seeking all over the disk. The data might be there, but it's practically useless. A user who discovers they need some archival data is never going to consider waiting weeks for the nearly 7 years of history to be replayed before running their query. Instead they will head over to etherscan and trust whatever it says. |
|
> The data might be there, but it's practically useless.
The availability of the packed data is useful, just not to the end user of the node. Having this data widely available on the network means that anyone can spin up an archive node by peering with other full nodes, they don't need to discover and peer with the very limited number of other archive nodes, and the network doesn't need to worry about losing that data permanently if all archive nodes go offline.
> A user who discovers they need some archival data is never going to consider waiting weeks for the nearly 7 years of history to be replayed before running their query. Instead they will head over to etherscan and trust whatever it says.
Call me unprincipled but I don't think it's an issue that if a user needs data above and beyond what's needed to fully verify the chain and read and write to it, they're expected to either spin up a more resource-intensive node or retrieve the data from a specialized history service. Statelessness is on the roadmap, so in the long-term the historical data that Etherscan and similar services serve up to you will come with a validity proof anyways.