Hacker News new | ask | show | jobs
by eesmith 2209 days ago
Python 3.7 added support for hash-based cache files as an alternative to time-based.

https://docs.python.org/3/reference/import.html#pyc-invalida...

Verifying a hash is a bit slower than checking the timestamp but far faster than parsing and byte compiling the source file, so I don't think this option is "significantly inert".

1 comments

Managed to miss this, thanks. I'd be interested to hunt out the BPO ticket at some stage to see if they benchmarked on NFS or spinning rust
Huh. I hadn't bothered to read the PEP, which is https://www.python.org/dev/peps/pep-0552/ on "Deterministic pycs."

> The current Python pyc format is the marshaled code object of the module prefixed by a magic number [7], the source timestamp, and the source file size. The presence of a source timestamp means that a pyc is not a deterministic function of the input file’s contents—it also depends on volatile metadata, the mtime of the source. Thus, pycs are a barrier to proper reproducibility.

That is, they were made for a quite different use case than you or I were talking about.

I looked at the PEP to see if it gave timing numbers. No luck - would be a good blog post if I were still blogging. It does say:

> The hash-based pyc format can impose the cost of reading and hashing every source file, which is more expensive than simply checking timestamps. Thus, for now, we expect it to be used mainly by distributors and power use cases.