Hacker News new | ask | show | jobs
by ylow 905 days ago
Author here.

The goal in a way is better reproducibility. The memo hashes the contents of the inputs, so if your cell is deterministic (and you cover all the inputs), the memo should give you the right answers. Of course if you want to rerun everything you can always just delete the memos, but properly used it should be a strict performance improvement.

Of course there are many improvements that can be made (tracking if dependent functions have changed etc) and of course there are inherent limitations. This is very much “work in progress”. But is quite useful right now!

2 comments

That only works for things entirely encapsulated by the value of the cell, which isn't a useful amount of stuff. This will only help with primative toy examples where you are slinging numpy arrays or other easily hashed data with no encapsulation or indetermancy or order independence.

Sorry, if it were possible to just generically cache computations it would be done everywhere for everything. This is just going to help with toy examples.

The only way this leads to "better reproducability" is if it fails to recompute things that it should have recomputed. If the computation was actually deterministic from its inputs the best and literally the only possible thing this can do is exactly what recomputing it would do, but faster. Frankly that you said it is more reproducible is enough evidence for me that you are doing something fundementally broken.

> The goal in a way is better reproducibility. The memo hashes the contents of the inputs, so if your cell is deterministic (and you cover all the inputs), the memo should give you the right answers.

That's an enormously large if, and without any way to detect or manage when that's not the case, it's hard to see how this could aid reproducibility. This adds a new failure mode for reproducibility where an important side-effect is not rerun, altering the results.

Think of this like incremental builds via make etc. You don't necessarily trust the output 100% but it speeds up work so much it's worth it, you can always do a clean build at key points where you want more confidence.
Sure, but that's not really the point of my comment. The author suggested this helps reproducibility in some way, and I'm trying to understand how this is supposed to help, when on its face, it can only hurt.
But compilers are deterministic, no?