Hacker News new | ask | show | jobs
by Animats 750 days ago
Aw, it's an ad.

As with yesterday's "big data" article, the usual question is whether you have enough transaction volume to need all this stuff. If you can organize your web site so that most traffic is read-only files, and the heavy machinery only turns on when somebody does a "buy" or something important, the whole problem probably fits into a CRUD app.

There's a useful insight in there - fan-in and fan-out are hard. I hit this in a completely different context - asset handing for a game-type client. Events come in ordering the creation of object X using assets A, B, and C. It may take several asset fetches from the network to draw something. An asset may be used for multiple drawable objects. There's caching and concurrent asset fetching. A request to create X has to start fetches for A, B, and C (they might be in cache, though) and wait until those fetches complete. A request for Y might want B, C, and D. Items should be fetched from the network only once, of course. This is a messy coordination problem.

Problems like this, with fan-in, fan-out, and concurrency, map badly to the standard paradigms. Neither threads with blocking nor "async" help much. The problem can be visualized as set operations, but can't easily be implemented that way. A single thread event driven coordination loop works, but it's kind of clunky.

So I started reading the article hoping for a theoretical breakthrough on fan-in, fan-out problems. No such luck.

The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations. But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.

Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.

1 comments

> The set-theory approach is hard to do, but promising. Each object that wants something has a small set of the things it wants. There's a big pool of such sets. There's also a big pool of the items you have, which changes constantly. It's easy to express what you need to fetch and which objects are now ready to go as set intersection and difference operations. But you need representations for big sparse sets which can do set operations fast. Probably B-trees, or something in that space.

Incremental updates to dynamic dependency graphs is a familiar problem for build tooling. I personally have used the taskflow C++ library (https://github.com/taskflow/taskflow) to great effect.

> Microsoft Research fooled around with this concept years ago in a different context. The idea was to have a database which supported pending SQL queries. The query would return new results when the database changed such that the results of the query changed. The goal was to to support that for millions of pending queries. Financial traders would love to have that. It's a very hard scaling problem. Don't know how that came out.

Incremental view maintenance is an active area of research. The likes of Noria and Materialize have done this with SQL, and the pg_ivm Postgres extension looks promising. Not sure if there is an equivalent implementation geared towards entity-component systems, though.