Hacker News new | ask | show | jobs
by stonerri 1190 days ago
The second layer is hard. I tried something in this space in mid-2018. Full text extraction and sentence segmentation tech was adequate, but extracting the discourse tree and building the graph was a bit of a struggle (trying to repurpose a collection of academic/open tools to get something useful). Never published or released the code.

If interested, a few rabbit holes to explore (no affiliations):

https://scite.ai -> best option for citation mapping, but same issues you described above

https://www.semanticscholar.org and AI2 -> the best group working on tooling in this space

https://www.weave.bio -> early startup trying to build this out

The hardest challenge in my view is solving the intermediate representation issue. You have to establish a DSL/nomenclature that provides the range required to represent a complete scholastic discourse while also being computable.

1 comments

> The hardest challenge in my view is solving the intermediate representation issue.

Right, you'd basically be writing an interpreter for English

Have GPT translate down into a Controlled Natural Language. I tried having it translate to OWL, but it sucked.

https://en.wikipedia.org/wiki/Controlled_natural_language