Hacker News new | ask | show | jobs
by carucez 5693 days ago
I've done a lot of scraping and parsing. Your best option is to fetch the SEC's RSS, then fetch the hard-to-parse XML/Free-form and parse it. XBRL is great in its own regard, but it's very difficult to relate XBRL fields with non-XBRL filings. You would do well to separate the two results.

SEC form 4 filings are in XBRL dating back to Jan 1, 2004 for every company. There are well over 1,000,000 forms filed between then and now... I know, I have them all locally right now.

You can scrape Google's Financial pages, obviously, and you can even get 2-minute data from a JSON "_5d" variable.

You can get fundamentals data from nasdaq pretty easily, too. Scraping it is a little difficult, but you can go 120 quarters back for many companies, and 5 years back for annual data.

I have a financial statements database populated with nasdaq scraped data right now. They update within a week or so after it's published to the SEC. You'll always be behind the curve, but you will have good information, and it is good information, albeit incomplete (missing things like the number of shares outstanding).