Hacker News new | ask | show | jobs
Training an AI to Extract Information from PDF Files with Varying Section Titles
1 points by Philosophia 736 days ago
How can I train an AI to extract details from PDF files? The sections I want to extract may have different titles for the same content. For example, let's say we have 1000 PDF files of essays. Each essay has a section for "background," but the section might be titled "background" in some PDFs and "my story" in others. The AI needs to identify these varying titles, determine where the section starts and ends, and then copy that content into an .xls file.