|
|
|
|
|
by Animats
4067 days ago
|
|
What's the problem finding executive names and ages? Get the SEC index for a CIK, pull the latest DEF 14A form[1], and start parsing the tables. Build a 2D data structure for each table. Look for tables that have column headings including "Name" and "Age". Then back up from the start of the table to the previous heading that's not associated with a previous table, and look for keywords in the heading such as "Director(s)" or "Executives". It's tougher when the filer tries to be cool and doesn't use tables for tabular data.[2] Then you have to figure out which <div> items are line breaks and which aren't. Fortunately, the SEC doesn't let you put Javascript or off-site CSS in a filing; it all has to be in one document. Yes, dumb scraping techniques like looking for CSS class names won't help, but it's not really that hard. [1] http://www.sec.gov/Archives/edgar/data/1288776/0001308179140...
[2] http://www.sec.gov/Archives/edgar/data/1326801/0001326801140... |
|