| HN Mirror

In order,

1. Yes, you can choose to create a persistent storage by passing `db_path` to `Storage()`. The current implementation is just an SQLite file. To run on many machines, you don't really need to be able to re-import from a dataframe (dumping to a dataframe is meant to be an exit point from `mandala` so that you can do downstream analyses in a format more familiar than `ComputationFrame`) - `ComputationFrame`s can be merged via the union (`|`) operator, see here https://amakelov.github.io/mandala/blog/01_cf/#tidy-tools-me... for an example. Storages don't support merging yet, but it's certainly possible!

2. Already answered in 1.

3. Nope, but I'd be happy to (though I feel like `mandala` took memoization in a substantially different direction). Are you in a position to make an introduction?

4. This project is currently not optimized for performance, though I've used it in projects spanning millions of memoized calls. The typical use case is to decorate functions that take a long time to compute, so the overhead of memoization amortizes. A very quick benchmark on my laptop shows ~6ms per call for in-memory storage, ~9ms for a persistent storage, with a simple arithmetic function that otherwise takes ~0 time.

5. Great question - currently, the dependency tracer is restricted to user-chosen functions to avoid tracking function calls an imported library makes. You could use a bit of magic (import-time automatic decoration) to track all functions in a file or a directory (not implemented right now). The reasoning is that, for a typical multi-month ML project, you usually have a single conda environment so you want to ignore library changes. Similarly, system-level (e.g. environment variables) are also not tracked. I think a very useful feature would be to at least record the versions of each imported library, so that storages can be ported between environments with some guarantees (or warnings).

6. - If an `@op` call was memoized, the underlying Python function call succeeded, so in this sense it can't be "broken"; it's however possible that there was a bug. In this case, you can delete the affected calls and all values that depend on them (if you keep these values, you're left with "zombie" values that don't have a proper computational history). The `ComputationFrame` supports declarative deletion - you build a ComputationFrame that captures the calls you want to delete, and call `.delete_calls()` - though there's still no example of this in the tutorial :) Alternatively, you can change the affected function and mark this as a new version. Then you should be able to delete all calls using the previous version (though, not supported at this moment).

- How the cache is invalidated is detailed here: https://github.com/amakelov/mandala?tab=readme-ov-file#how-i...