| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thumper 6176 days ago

I don't think that there was disagreement that there would one day be a problem, even in 2004. But "drawing on the same set of resources" doesn't seem like a fair way to think of it, especially to conclude that your idea would have led to the same thing.

I am one of the researchers on this project, and there are no "resources" which are being made available to us -- it's been something of an uphill battle to do more than just talk. We came up with our idea in January 2006, and spent the year implementing it. Even now, with this attention and with it being open source, there are no volunteers stepping forward to help make it a reality on the Wikipedia itself. We receive very little funding (from CITRUS and LANL, not Wikipedia), and I pay my own tuition from my side-job. Our research group has been really focused this last year to make the code "production ready" instead of just research code, but don't forget that we are academics -- it's difficult to justify our spending time this way, except that we truly believe in the project.

In terms of "credit for the idea", the earliest published work that I have seen is the "Puppy Smoothies" article in First Monday ( http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/ar... ) We cited that and every other blog posting (unusual for an academic paper!) we could find in our first paper. You'll see that there are actually several that existed at the time, but the real issue is in finding the man power to implement the ideas.

1 comments

Alex3917 6176 days ago

Gotcha. By resources I actually meant the data available (timestamp, page views, contributor history, etc.), and not the actual financial support.

Anyway I wasn't trying to take credit for your work, I was just annoyed that no one wanted to discuss it at the time. Great job though, this is a really cool (and important) project!

thumper 6176 days ago

Thanks. Okay, I see what you're saying. Actually, they still won't make that information available, though we have been asking a while. That was a big challenge when we started, but I think it was a good constraint because it pushed us in this direction to think about how the text itself evolves. Upcoming work will be looking at new signals to inform the page quality, such as activity on the Talk pages -- so there's no shortage of ideas, only of time to work on it.

On the page views signal - there was a great paper in the last few years that came up with a way to estimate it from multiple other data sources (eg, Alexa). I don't remember the title, but it was an impressive bit of stitching together. If I had the time/money/resources, I would love to get that as a signal in our work and see if it helps. I'm not 100% convinced it would, because of vandalism I've seen which lasted for years on a somewhat popular page -- so even many eyes does not help if no one will take action to fix incorrect data.