Hacker News new | ask | show | jobs
by faizshah 814 days ago
I feel like temporal’s approach of replaying the computation is probably better than trying to serialize the running computation. Serializing the running coroutine gets ugly as is shown here with things like file handles and making pickle a central part of your compute platform is a little scary from an AppSec pov.

That being said, I like the idea and the blog post is wonderfully written.

2 comments

Dispatch takes a different approach with different trade-offs, but you're right that capturing the local program scope brings interesting challenges.

Serializing file handles doesn't work, but in our experience, programs rarely run into constructs where this becomes a problem, and when it happens, there are mitigation measures that are usually easy to implement (small restructure of the program, capturing resource metadata to reconstruct them later, etc...).

We have a few features on the roadmap to help mitigate the security implications as well, including allowing users to store their program state in a S3 bucket that they own. Our scheduler can operate with only metadata about the program, so splitting the two can be an effective model to mitigate those risks.

Storing the data in the customer's S3 bucket is a really good idea IMO. Means they just have to trust you to run the coordination reliably, but not necessarily to keep their data secure which is a much easier bar to clear.
What about combining it with rpyc for network transparent object proxies? For resources that can't be picked.
With the right abstractions, running computation is just data.
They abstraction being the universe.