|
|
|
|
|
by Nicholas_C
4067 days ago
|
|
Getting SEC filing data is an absolute nightmare. Every time I think of a project that includes SEC filing data (Executive names/ages, MD&A text analysis, etc.) I skip it and move on to something that's just as interesting but less time consuming and more doable. There doesn't appear to be a scalable solution. |
|
It's tougher when the filer tries to be cool and doesn't use tables for tabular data.[2] Then you have to figure out which <div> items are line breaks and which aren't. Fortunately, the SEC doesn't let you put Javascript or off-site CSS in a filing; it all has to be in one document.
Yes, dumb scraping techniques like looking for CSS class names won't help, but it's not really that hard.
[1] http://www.sec.gov/Archives/edgar/data/1288776/0001308179140... [2] http://www.sec.gov/Archives/edgar/data/1326801/0001326801140...