|
|
|
|
|
by Alacart
1023 days ago
|
|
Ah yes, I too have accidentally committed node_modules. Jokes aside, and coming from a place of ignorance, it's interesting to me that a file count that size is still a real performance issue for git. I'd have expected something that's so ubiquitous and core to most of the software world hasn't seen improvements there. Genuine, non snarky question:
Are there some fundamental aspects of git that would make it either very difficult to improve that, or that would sacrifice some important benefits if they were made? Or is this a case of it being a large effort and no one has particularly cared enough yet to take it on? |
|
It’s hard to look at a million files on disk and figure out which ones have changed. Git, by default, examines the filesystem metadata. It takes a long time to examine the metadata for a million files.
The main alternative approaches are:
- Locking: Git makes all the files read-only, so you have to unlock them first before editing. This way, you only have to look at the unlocked files.
- Watching: Keep a process running in the background and listen to notifications that the files have changed.
- Virtual filesystem: Present a virtual filesystem to the user, so all file modifications go through some kind of Git daemon running in the background.
All three approaches have been used by various version control systems. They’re not easy approaches by any means, and they all have major impacts on the way you have to set up your Git repository.
People also want e.g. sparse checkouts, when you’re working with such large repos.