Hacker News new | ask | show | jobs
by jisnsm 456 days ago
I work for a hosting and I know what this is like. And, while I completely respect that you don’t want to give out your resources for free, a properly programmed and/or cached website wouldn’t be brought down by crawlers, no matter how aggressive. Crawlers are hitting our clients sites all the same but you only hear problems from those who have piece of shit websites that take seconds to generate.
1 comments

git blame is always expansive to compute; and precomputing (or caching) it for every revision of every file is going to consume a lot of storage.
I guess for computationally expensive things the only real option is to put it behind a login. I’m sure this is something SourceHut doesn’t want to do but maybe it’s their only decent option.
On SourceHut, git blame is available while logged out, but the link to blame pages is only showed to logged-in users. That may be a recent change to fight scrapers.
Precomputing git blame should take the same order of magnitude of storage as the repository itself. Same number of lines for each revision, same number of lines changed in every commit.
Should be easy to write a script that takes a branch and constructs a mirror branch with git blame output. Then compare storage space used.
It is more fun to fight LLMs rather than trying to create magical unicorn caches that work for every endpoint.