Hacker News new | ask | show | jobs
by progval 1753 days ago
How would it perform for, say, 500TB of source code?

And what would be the disk and memory requirements for this? Could they be distributed across a handful of servers?

2 comments

I'd be surprised if this question could have an off hand answer. Doesn't sound like something that could have scalability predictable enough to do back of the envelope calculations on.
What on earth has this much source code? Every open source project ever?
Yes, good guess! That's the size we have after deduplication across projects at https://www.softwareheritage.org/ . We archive all the source code we can find; and would like to support some sort of full-text search on it at some point, so Glean looks interesting
I mean, yeah. Imagine being able to do more rich queries against GitHub.