Hacker News new | ask | show | jobs
by gpa 1527 days ago
Sounds reasonable. And yes, I am a one-man team. But then, after collecting all this information, how do you organize and store all this information for on-demand availability? For tables extracted from the publication, you can create a new table withing a relational database. But how do you organize single data points extracted from the text? Where do you store them?
1 comments

If I were writing a workflow system in a hurry I'd use arangodb in the back end and a web server based on asyncio. I like the idea of something RDF-based but I'd need to develop the right algebra for isolating individual "records" in an RDF database so they can be updated safely.

I worked at a startup that built a system with quite a few parts including a system for annotating text to train an extraction pipeline. That last bit had a typescript/react front end, a scala web server, and kept data in an elaborate set of tables in postgres. They were in a hurry to get it working for one particular customer so it wound up pretty half-baked.

I have a lot of ideas on this so look up my profile and send me an email!