Hacker News new | ask | show | jobs
by cantdutchthis 454 days ago
(someone from the marimo team here)

The `export` command can generate a rendered artifact if that's what you're after but there is also another avenue here, have you seen the caching feature? The one that caches to disk and persists?

https://docs.marimo.io/guides/expensive_notebooks/?h=cache#d...

This can automatically store the output of expensive functions, keeping the previous state of the cells in mind. If a re-compute ever needs to happen it will just load it straight from cache.

Another option is to run in lazy mode, documented here:

https://docs.marimo.io/guides/expensive_notebooks/?h=cache#l...

This will prevent the notebook from rerunning cells by accident.

We're thinking about adding features that would make marimo great for running long running batch work but there's not a whole lot I can share about it yet. If you have specific thoughts or concerns though, feel free to join our discord!

https://marimo.io/discord?ref=nav

2 comments

The caching is a very nice feature, and will stop me from keeping my computer running for days/weeks while I work on a notebook.

If I understand it correctly, `@mo.persistent_cache(name="my_cache")` creates a binary file `my_cache` that I should commit as well if I don't want others to repeat the computation?

This kinda solves the problem, except for having two files per notebook, and that marimo notebooks are no longer viewable with output on github directly.

The default "store" is a local FileStore. In your case, it will save the outputs to a file on disk called `my_cache`.

We plan to add more stores like Redis, S3-bucket, or an external server, since you may not always want to commit this file, but like you said want others to avoid the computation.

> This will prevent the notebook from rerunning cells by accident.

Is that really what's wanted? If there's some cell that I need to run twice for some reason that I tried debugging but wasn't able to figure out why, or for debug just run cells 1-5 in order but on specific other (prod) systems, skip cell 4 and run 3 before 2. Now, arguably well written software would handle those things automatically, but we're not talking about battle hardened software that's had an SRE team vigorously refactor it until it's been proven suitable for such purpose by having been on call for it for months off not years. We're talking about notebooks, which have their time and place, but the entire point, I would argue, is to make it easier run notebooks in production without the added overhead of said SRE team. And in that world, the reality is the PhD is gonna have some things they know they should fix, but it's easier to just comment out the unneeded cells on the prod and hit run all, so shouldn't the tools better support that use case (by saving what was actually run and offering to rerun that) over caching based on input string and hoping for the best?