| Good questions. > How will you deal with files that are purposely broken, or which cause the parser to take impractical (but finite) times to complete? I've never seen a language parser do that, but if I run into a language that does that, I'll probably have my VCS track it at the file level, based on tokens or lines. Dumb languages don't get nice things. :) > How will you maintain a history when most commits are likely to contain unparseable code and so break the continuity of objects? This is less of a problem with binary files (assuming the source software does not have bugs in output), but with source files, you're right that that problem does exist. As of right now, I would do a token-based approach. This approach removes the need for whitespace-only commits, and if I track the tokens right, I should be able to identify which right brace used to end the function until the broken code was saved. Then I would just save the function as broken using that same right brace. For example, say you have this: int main() {
return 0;
}
My VCS would know that the right brace corresponds to the end of the function.Then you write this: int main() {
if (global_bool) {
return 0;
}
Yes, a dumb system might think that the right brace is for the `if`.However, if you break it down by tokens, the VCS will see that `if (global_bool) {` were added before the return, so it should be able to tell that the right brace still ends the function. I hope that makes sense. Another plausible way to do it (at least in C) would be to look for things that look like declarations. The series of tokens `<type> <name> <left_paren>` is probably a function declaration. Java would be easier; its declarations are more wordy. I still have to prove this is possible, but I think it is. |
C++ is gonna get really funky there, with e.g. templates