|
|
|
|
|
by pm215
174 days ago
|
|
I'm curious about whether there are well coded AI scrapers that have logic for "aha, this is a git forge, git clone it instead of scraping, and git fetch on a rescrape". Why are there apparently so many naive (but still coded to be massively parallel and botnet like, which is not naive in that aspect) crawlers out there? |
|
If they're treating it specifically on “code forge” (because they're after coding use cases), there's lots of interesting information that you won't get by just cloning a repo.
It's not just the current state of the repo, or all commits (and their messages). It's the initial issue (and discussion) that lead to a pull request (and review comments) that eventually gets squashed into a single commit.
The way you code with an agent is a lot more similar to the: issue, comments, change, review, refinement sequence; that you get by slurping the website.