| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by josephsavona 3351 days ago

Hey! I'm Joe and work on the Relay core team.

These are good questions: why does Relay Modern have garbage collection? Is that just a fancy name for cache eviction?

Let's put aside naming for a moment. Relay stores GraphQL data in normalized form, as a map of global identifiers to records. Each record has an identifier, type, and map of fields to values. Relationships between objects are expressed as fields that "link" to other records. These links are expressed as data structures - an object such as `{__ref: <id>}` - as opposed to direct references to the objects.

Using object references would mean that Relay could in theory let the JS runtime do garbage collection: except that the runtime would only see a cyclic graph of objects for which (typically) at least one root object had a persistent reference (the record corresponding to the root of the graph). In other words, it would do its job and retain all records in memory since they would all be (in)directly referenced from the root object, which would have to be referenced by Relay in order to access the data.

Relay, however, has more knowledge than the JS runtime does about how this data can be accessed: it can analyze the currently active queries against the object graph to determine which records are required to fulfill those queries. This is what the garbage collection feature does: remove records that may not be referenced by any active query.

Note that this has some aspects in common with standard garbage collection in programming language runtimes. There is a mapping of identifiers (memory addresses) to values. Each value may contain links (pointers) to other records (blocks of memory). Because the graph has cyclic references, standard cache eviction strategies - LIFO, LRU, etc - don't necessarily apply as they might evict data that is still referenced.

I hope this helps shed some light on this feature. Questions and suggestions (PRs) welcome!