|
|
|
|
|
by gwern
4356 days ago
|
|
At least part of the problem is that he's generating what one might call 'info trash': he's taking highly structured information from databases, and turning it into natural-language prose, a data source of less value since it's less structured. These prose versions are now going to steadily fall out of sync with the original databases, be much more prominent in Wikipedia and Google, diverge from each other, be harder to parse and perform any complex analysis on (a database is at least relatively comprehensible, but to parse his dumps you have to hope you can reverse-engineer it, no other bots or editors have modified it much, and that he didn't get clever with his format strings), etc. If at some point one wanted to change something about the presentation, it's no longer a matter of editing one template and now the user-friendly HTML view onto the database is automatically updated for all viewers, now one has to run a carefully-written bot on millions of articles (and since that is beyond semi-automated bots, you have to have special permission to run it). It would have been better to work on merging databases or exporting them into a structured site, something like Freebase. |
|
I still think the article is useful as is, with just the map, data sheet, and demographics, and of course many incorporations have additional human-composed information added.
I could imagine some more structured data source, where the main article redirects to a table and scrolls to the correct spot. I would be fine with that, but as far as I know that concept doesn't exist on Wikipedia.