Hacker News new | ask | show | jobs
by piesauce 1793 days ago
Thanks for the kind words! We don't use XBRL at all. We did try it initially, but it was wildly inconsistent across companies. I think one of the things that worked well for us was that we spent a lot of time at the initial stages of the pipeline (efficient sentence and word tokenization, span detection), that bode well for our models later on.
1 comments

Thanks! This is similar to where I ended up landing as well. It turns out using a non-standardized standard format is practically worse than dealing with giant blobs of plain text!
So true