| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stonerri 1236 days ago

The second layer is hard. I tried something in this space in mid-2018. Full text extraction and sentence segmentation tech was adequate, but extracting the discourse tree and building the graph was a bit of a struggle (trying to repurpose a collection of academic/open tools to get something useful). Never published or released the code.

If interested, a few rabbit holes to explore (no affiliations):

https://scite.ai -> best option for citation mapping, but same issues you described above

https://www.semanticscholar.org and AI2 -> the best group working on tooling in this space

https://www.weave.bio -> early startup trying to build this out

The hardest challenge in my view is solving the intermediate representation issue. You have to establish a DSL/nomenclature that provides the range required to represent a complete scholastic discourse while also being computable.

1 comments

21eleven 1236 days ago

> The hardest challenge in my view is solving the intermediate representation issue.

Right, you'd basically be writing an interpreter for English

link

nerpderp82 1235 days ago

Have GPT translate down into a Controlled Natural Language. I tried having it translate to OWL, but it sucked.

https://en.wikipedia.org/wiki/Controlled_natural_language

link