Hacker News new | ask | show | jobs
by osigurdson 842 days ago
I don't understand. Why build up text chunks from different, non-contiguous sections?
2 comments

On the level of paper, not everything is laid out linearly. The main text is often laid out in column, the flow can be be offset with pictures with a caption, additional text can be placed in inserts, etc ...

You need a human eye to figure that out and this is the task nlm-ingestor tackles.

As for the content, semantic contiguity is not always guaranteed. A typical example of this are conversations, where people engage in narrative/argumentative competitions. Topics get nested as the conversation advances, along the lines of "Hey, this remind me of ...". Building up a stack that can be popped once subtopics have been exhausted: "To get back to the topic of ...".

This is explored at length by Kebrat-Orecchioni in:

https://www.cambridge.org/core/journals/language-in-society/...

And an explanation is offered by Dessalles in:

https://telecom-paris.hal.science/hal-03814068/document

If those non-contiguous sections share similar semantic/other meaning, it can make sense from a search perspective to group them?
it starts to look like a graph problem