Hacker News new | ask | show | jobs
by kjqgqkejbfefn 841 days ago
On the level of paper, not everything is laid out linearly. The main text is often laid out in column, the flow can be be offset with pictures with a caption, additional text can be placed in inserts, etc ...

You need a human eye to figure that out and this is the task nlm-ingestor tackles.

As for the content, semantic contiguity is not always guaranteed. A typical example of this are conversations, where people engage in narrative/argumentative competitions. Topics get nested as the conversation advances, along the lines of "Hey, this remind me of ...". Building up a stack that can be popped once subtopics have been exhausted: "To get back to the topic of ...".

This is explored at length by Kebrat-Orecchioni in:

https://www.cambridge.org/core/journals/language-in-society/...

And an explanation is offered by Dessalles in:

https://telecom-paris.hal.science/hal-03814068/document