Hacker News new | ask | show | jobs
by svat 400 days ago
The live back-and-forth is the main point of what I'm asking for — I tried your cpdf (thanks for the mention; will add it to my list) and it too doesn't help; all it does is, somewhere 9000-odd lines into the JSON file, turn the part of the content stream corresponding to what I mentioned in the earlier comment into:

        [
          [ { "F": 0.0 }, "g" ],
          [ { "F": 0.0 }, "G" ],
          [ { "F": 0.0 }, "g" ],
          [ { "F": 0.0 }, "G" ],
          [ "BT" ],
          [ "/F19", { "F": 10.9091 }, "Tf" ],
          [ { "F": 88.93600000000001 }, { "F": 709.0410000000001 }, "Td" ],
          [
            [
              "Subsequen",
              { "F": 28.0 },
              "t",
              { "F": -374.0 },
              "to",
              { "F": -373.0 },
              "the",
              { "F": -373.0 },
              "p",
              { "F": -28.0 },
              "erio",
              { "F": -28.0 },
              "d",
              { "F": -373.0 },
              "analyzed",
              { "F": -373.0 },
              "in",
              { "F": -374.0 },
              "our",
              { "F": -373.0 },
              "study",
              { "F": 83.0 },
              ",",
              { "F": -383.0 },
              "Bridge's",
              { "F": -373.0 },
              "paren",
              { "F": 27.0 },
              "t",
              { "F": -373.0 },
              "compan",
              { "F": 28.0 },
              "y",
              { "F": -373.0 },
              "Ne",
              { "F": -1.0 },
              "wGlob",
              { "F": -27.0 },
              "e",
              { "F": -374.0 },
              "reduced"
            ],
            "TJ"
          ],
          [ { "F": -16.936 }, { "F": -21.922 }, "Td" ],
This is just a more verbose restatement of what's in the PDF file; the real questions I'm asking are:

- How can a user get to this part, from viewing the PDF file? (Note that the PDF page objects are not necessarily a flat list; they are often nested at different levels of “kids”.)

- How can a user understand these instructions, and “see” how they correspond to what is visually displayed on the PDF file?