Hacker News new | ask | show | jobs
by ctxc 51 days ago
I do something tangential. If you can pick out identifiers for these pages with deterministic features (financial pages have the most numbers on the page, or have the word "director", "general manager" and "managing director" on the same page, etc)

Pick those pages out and pipe only those to the llm

Luckily I work with excel w lots of sheets and don't need to do pdf to text conversion etc