|
Good catches, thank you. Timestamps: You're right that the current merge uses committer timestamps (LWW), and clocks can disagree. The spec is explicit about this tradeoff — Principle 4: "Last-writer-wins over Lamport clocks." The reasoning: for issue tracking (as opposed to, say, collaborative editing), the practical risk of clock skew producing a wrong merge result is low, and when it does happen, a follow-up commit corrects it. The format is versioned (Format-Version: 1), so a future version can introduce logical clocks if production use reveals timestamp-related bugs. In 8+ months of use — including imports from multi-contributor projects (GitHub, GitLab, Gitea) — clock-skew issues haven't surfaced. I'm now adopting it in a team setting at work, which will be the real stress test for the merge heuristics. UUIDs: In practice, users interact with 7-character short IDs (a7f3b2c). The CLI resolves abbreviations unambiguously, similar to how git log abc1234 works. Sequential IDs would require a central counter, which breaks in distributed systems. Namespace: refs/issues/* doesn't appear in branch listings (git branch, git log --all with default config). Most Git GUIs filter to refs/heads/* and refs/remotes/*. For fetch performance with many issues, Git protocol v2 does server-side ref filtering, so only requested refs transfer. Valid concern though worth documenting. Attachments: Agreed, that's tracked for Format-Version 2 — binary blobs in the tree object instead of the empty tree. The spec's Section 12 outlines this. The format spec is designed to evolve. Appreciate the detailed feedback. |
It shows that I posted just a little later than you.
I agree that the chances of something going wrong with timestamps are low, but I still think it’s worth considering as a potential security risk or injection vector — although I’m not sure how realistic that threat actually is.
Regarding UUIDs, you can get the best of both worlds: sequential IDs are convenient, and adding a small number of random characters can help avoid name collisions. Since the random component only needs to ensure uniqueness across local repositories, it doesn’t require many characters.
I can see that you’ve put a great deal of effort into this, especially with the various bridging components. I built a simple bridge myself and ran some tests using the pandas project (https://github.com/pandas-dev/pandas ), which has more than 30,000 issues. Even storing only metadata (such as title and type, excluding the body) as plain text takes more than 100 MB, which seems quite large. In comparison, storing the same data in SQL takes only about 10 MB, and packed Git objects are comparable.
So while storing issues as Git commits certainly has some benefits, I don't see much advantage beyond that. It also seems that most users would not be able to make practical use of this approach easily — for example, for batch processing — unless they are already quite comfortable working directly with Git commits.
I'm curious about what considerations led you to decide to store issues as empty Git commits. I would appreciate it if you could share your reasoning with me.