Hacker News new | ask | show | jobs
by greenjello 4366 days ago
StackOverflow, Wikipedia, and other creative commons sites provide good starting data sets because all you need to do is attribute to them. No expensive licensing things, or anything like that.

Often you can write a pretty good bot using their data, and emulate user interactions. Enough users won't notice the site is run by bots that it's better than nothing if you don't have funding :-)