|
|
|
|
|
by authorfly
686 days ago
|
|
I have great news I wish someone delivered to me when I was in your shoes - try "GROBID". It parses papers into objects with abstract/body/figures! It will help you out a great deal. It is designed for papers and can extract the text almost flawlessly, but also give information on graphs for separate processing. I have several years experience with academic text processing (including presentations) working with an Academic Publisher if I could be helpful to anything? |
|
I wish I was hiring, if that's what you're asking ;) Otherwise, if you have any ideas for processing formulas (even just for reading them out, but any extra steps towards expressing what they mean - ' 'sum divided by count' is 'mean'/'average' value ' being the most simple example I can think of) I'd love to hear them. Novel ideas in technical papers are often expressed with formulas which aren't that complicated conceptually, but are critical to understanding the whole paper and that was another piece I was having very mixed results with.