Hacker News new | ask | show | jobs
by yomritoyj 2425 days ago
I really have always wished for reproducibility. Thanks for taking up this feature. How do you handle aliasing and references inside objects? Suppose I have

    #Cell 1
    a = [1,2,3]
    b = (a,True)

    #Cell 2
    b[0][0] = 5

    #Cell 3
    print(sum(a))
Now if I change Cell 2 to

   # Cell 2'
   b[0][0] = 4
and execute, Cell 3's result becomes stale. Do you track such dependencies? Would really love to read more about the underlying implementation.
1 comments

If you mutate an object itself, we can't really track that. There's no magic going on; you can break the state if you use mutable objects. It's less of an issue in Scala where immutable data structures are the norm, but I can imagine it would be disappointing in Python.

Currently it takes a shallow copy of the state output by each cell, meaning every value is going to be a primitive value or a reference. If it's a reference to mutable state, you're kind of on your own with respect to keeping reproducibility. I felt like this was a good compromise between strictly enforced reproducibility and practicality; if it turns out to be confusing we could consider deep copying the state, or having an option to do that (I could imagine it being pretty bad for efficiency in a lot of ML use cases, though).

I am not familiar with those notenooks. What would be wrong with re-executing all the cells below the one that changed?
That is usually a feature. The reason it's not the default everytime you change a cell if that cells can contain long running calculations.