Hacker News new | ask | show | jobs
by Thiez 3471 days ago
> What Polaris does is automatically track all of the interactions between objects, which can number in the thousands for a single page. For example, it notes when one object reads the data in another object, or updates a value in another object. It then uses its detailed log of these interactions to create a “dependency graph” for the page.

> Mickens offers the analogy of a travelling businessperson. When you visit one city, you sometimes discover more cities you have to visit before going home. If someone gave you the entire list of cities ahead of time, you could plan the fastest possible route. Without the list, though, you have to discover new cities as you go, which results in unnecessary zig-zagging between far-away cities.

What a terrible analogy. Finding a topological sorting is O(|V|+|E|), while the traveling salesman problem is NP-complete.

2 comments

He's not making a comparison to the traveling salesman problem; he's saying the businessperson only intended to visit one city, but the trip ended up requiring visits to several additional cities.

It's not a terrible analogy. You request an HTML page and you don't know until after you load it (visit the initial city) exactly what other resources--images, css, js, etc.--you'll need to download (additional cities to visit).

That's amusing, and I wonder if this particular analogy was chosen deliberately. But I don't think there's anything wrong with it - it's designed to make intuitive sense to non-programming readers, not to be some rigorous description that can be automatically translated into optimal code.
Perhaps a different analogy would be better, e.g. ordering materials when building a house. If you can only order one material at a time, you would probably want to order the concrete for laying the foundation before ordering the roof tiles. Loading website resources is a lot like that (at least compared to the travelling salesman).
This still smells NP Hard. I mean, in practice for simple dependencies it is probably quite tractable, but this is a combinatorial optimization problem that seems pretty similar to an online modification to Job Shop Scheduling, where the material requirements map loosely to machine-task pairings that would be unblocked by orders, acting to make the problem more complex, not easier.
Your sense of smell is off. Assuming a directed acyclic graph (the usual shape of a dependency tree), assume we write the result order to a list L. Walk over all vertices once, create a mapping M of each vertex to its number of incoming edges, and add all vertices without incoming edges to a list R. This is O(|V| + |E|). Now, pick the first item of R and append it to L. For each outgoing edge in the item we chose, decrement the associated count of its receiving vertex in M by 1. For each item where the count becomes 0, append it to R. Keep running until R is empty, and you're done. This second part of the process is again O(|V| + |E|).

We're done and found a topological order in O(|V| + |E|).

Yes, I know that topological sort is O(|V| + |E|). What I am claiming is that the problem of buying building materials in an optimal order isn't topological sort: there are numerous topological orders of a graph, but some will be much much slower than others to build at your construction site. To determine which of the many possible orders is the fastest one, you have to take into account how long various tasks will take and how many workers you have available for those tasks. When you get some materials, that unlocks certain parts of the gantt chart of what sounds like Job Shop Scheduling. To me, this is the same form of complaint as pointing out that Traveling Salesman isn't Topological Sort.
"If someone gave you the entire list of cities ahead of time, you could plan the fastest possible route."

Dynamic programming. So it saves time in the end.