| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by piesauce 1793 days ago
	Thanks for the kind words! We don't use XBRL at all. We did try it initially, but it was wildly inconsistent across companies. I think one of the things that worked well for us was that we spent a lot of time at the initial stages of the pipeline (efficient sentence and word tokenization, span detection), that bode well for our models later on.

1 comments

ZeroCool2u 1793 days ago

Thanks! This is similar to where I ended up landing as well. It turns out using a non-standardized standard format is practically worse than dealing with giant blobs of plain text!

link

kbennatti 1793 days ago

So true

link