Hacker News new | ask | show | jobs
by superjan 329 days ago
Well, it is not pretty to see how the sausage gets made, but extracting formatted text from docx is absolutely doable, no PhD involved. Source: I have done it as a little sidequest because it was useful to audit a set of word documents.