|
|
|
|
|
by evanhu_
907 days ago
|
|
I did try that at first, it was hard to parse through the HTML code and organize into logical sections (authors, references, abstract) and then clean up the text to prepare it optimally for chunking and embedding. Once I found GROBID I just went with that route because it handled all that for me. |
|