|
As someone very familiar with the Knuth–Plass line-breaking algorithm (https://tex.stackexchange.com/a/423578/48), an important difference I see here is that for paragraphs (the domain of TeX), there is no "state" that needs to be preserved across lines: if you know that your paragraph is going to choose a certain break-point, then you can pretty much typeset the "before" and "after" parts independently, each optimally. (With one exception: there is a penalty for hyphens being on successive lines, so we need to track whether the previous line was hyphenated.) This is the "optimal substructures" property that makes it so amenable to dynamic programming. With the code formatter, to format the part after a certain character, you need to keep track of the indentation depth of all the expressions that have not yet terminated at this point — because you presumably want parallel expressions to be formatted with the same indentation depth, for closing parentheses to match their corresponding opening parentheses, etc. For example, in this example: experimental = document.querySelectorAll('link').any((link) =>
link.attributes['rel'] == 'import' &&
link.attributes['href'] == POLYMER_EXPERIMENTAL_HTML);
and, say (I'm making this up): experimental =
document.querySelectorAll('link').any(
(link) =>
link.attributes['rel'] == 'import' &&
link.attributes['href'] == POLYMER_EXPERIMENTAL_HTML);
— knowing that there's a break after the `&&` is not enough; you also need to know the indentation of the previous expressions, to decide how you're going to format the part after the `&&`.This is what the author alludes to in the post: > A line break changes the indentation of the remainder of the statement, which in turn affects which other line breaks are needed. Sorry, Knuth. No dynamic programming this time. […] For most of the time, the formatter did use dynamic programming and memoization. […] It worked fairly well, but was a nightmare to debug. > It was highly recursive, and ensuring that the keys to the memoization table were precise enough to not cause bugs but not so precise that the cache lookups always fail was a very delicate balancing act. Over time, the amount of data needed to uniquely identify the state of a subproblem grew, including things like the entire expression nesting stack at a point in the line, and the memoization table performed worse and worse. In TeX, paragraphs have each line of the same width (simple case) or can have a \parshape (in general), but these are "global" constraints that don't depend on what breaks you choose. |
If I recall correctly Knuth uses Dijkstra on a graph with nodes of 'line break at position x' and I don't see why you couldn't use a graph consisting of 'line break at position x indentation y' or something similar.