| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Animats 4067 days ago
	Right. I haven't updated that in years, and HTML is more bloated than it used to be. The Downside financial extractor, written around 2000, is totally obsolete - it reads the human-readable tables and tries to make sense of them. That hasn't worked well since HTML stopped using <table> for tables. It doesn't understand XBRL; it reads the human-readable tables and attempts to create an early version of XBRL from them. I'd looked into building a better system years ago, but there's a patent problem with a patent on extracting data from financial tables where the sign of items is ambiguous and it's not obvious which lines add up to which totals. ("Net loss" in a human readable table may be expressed either as a positive or negative number.) A more modern system is deep inside of sitetruth.com; if SiteTruth can find the business behind a web site and tie it to a SEC filer, a button will appear to access the SEC filings for that company.