| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by remenoscodes 110 days ago

Good catches, thank you.

Timestamps: You're right that the current merge uses committer timestamps (LWW), and clocks can disagree. The spec is explicit about this tradeoff — Principle 4: "Last-writer-wins over Lamport clocks." The reasoning: for issue tracking (as opposed to, say, collaborative editing), the practical risk of clock skew producing a wrong merge result is low, and when it does happen, a follow-up commit corrects it. The format is versioned (Format-Version: 1), so a future version can introduce logical clocks if production use reveals timestamp-related bugs. In 8+ months of use — including imports from multi-contributor projects (GitHub, GitLab, Gitea) — clock-skew issues haven't surfaced. I'm now adopting it in a team setting at work, which will be the real stress test for the merge heuristics.

UUIDs: In practice, users interact with 7-character short IDs (a7f3b2c). The CLI resolves abbreviations unambiguously, similar to how git log abc1234 works. Sequential IDs would require a central counter, which breaks in distributed systems.

Namespace: refs/issues/* doesn't appear in branch listings (git branch, git log --all with default config). Most Git GUIs filter to refs/heads/* and refs/remotes/*. For fetch performance with many issues, Git protocol v2 does server-side ref filtering, so only requested refs transfer. Valid concern though worth documenting.

Attachments: Agreed, that's tracked for Format-Version 2 — binary blobs in the tree object instead of the empty tree. The spec's Section 12 outlines this.

The format spec is designed to evolve. Appreciate the detailed feedback.

1 comments

kwhkim 110 days ago

I’d like to share my work with you: https://news.ycombinator.com/item?id=47137452

It shows that I posted just a little later than you.

I agree that the chances of something going wrong with timestamps are low, but I still think it’s worth considering as a potential security risk or injection vector — although I’m not sure how realistic that threat actually is.

Regarding UUIDs, you can get the best of both worlds: sequential IDs are convenient, and adding a small number of random characters can help avoid name collisions. Since the random component only needs to ensure uniqueness across local repositories, it doesn’t require many characters.

I can see that you’ve put a great deal of effort into this, especially with the various bridging components. I built a simple bridge myself and ran some tests using the pandas project (https://github.com/pandas-dev/pandas ), which has more than 30,000 issues. Even storing only metadata (such as title and type, excluding the body) as plain text takes more than 100 MB, which seems quite large. In comparison, storing the same data in SQL takes only about 10 MB, and packed Git objects are comparable.

So while storing issues as Git commits certainly has some benefits, I don't see much advantage beyond that. It also seems that most users would not be able to make practical use of this approach easily — for example, for batch processing — unless they are already quite comfortable working directly with Git commits.

I'm curious about what considerations led you to decide to store issues as empty Git commits. I would appreciate it if you could share your reasoning with me.

link

remenoscodes 110 days ago

Cool, just looked at git-pad. Same day, different data models for the same problem. Independent convergence is a good signal.

On why empty commits: this started with Linus's 2007 rant about wanting "a git for bugs." I took it literally, how far can Git's existing primitives go without introducing anything new? No files, no JSON, no database. Just commits, refs, and trailers.

The mapping: issues are append-only event logs (create -> comment -> edit -> close). Git is an append-only content-addressable store. Each commit is an event. The ref tip is current state. Trailers carry structured metadata in the same format as Signed-off-by:. Merge commits handle divergence. The entire Git toolchain works out of the box — log, rev-list,interpret-trailers, GPG signing, refspecs.

The implementation is a proof of concept though. What I really care about is ISSUE-FORMAT.md as a standalone format spec. Most of the internet runs on community-agreed specifications where the spec is the contract and implementations are details. If we have a canonical issue format, Forgejo or GitKraken or whoever can build a proper UI around it. Different implementations emerge — shell, C, Rust — until we find the optimal one. The spec is the deliverable, not the CLI.

Storage: packed Git objects are comparable to SQL for metadata. The shell won't scale to 30K issues, a C implementation with libgit2 would. That's a known limitation of v1.

Timestamps: fair concern. The format is versioned (Format-Version: 1), so logical clocks can be added in a future version without breaking existing data. For v1, LWW was the pragmatic choice — keeps the spec implementable by any tool that can read Git commits.

The bridges solved a specific problem I kept hitting: migrating projects between GitLab and GitHub or Gitea(and now azure devops) while keeping issues intact. That alone justified the effort.

Curious about git-pad's file-based approach: what happens when two contributors edit the same issue file offline and then push? Standard Git merge conflict, or do you handle it at a higher level? (Haven't had time to look at the implementation code yet)

link

kwhkim 110 days ago

As for the merge conflict, you resolve it the same way you would with any other file in git. I think a custom merge driver needs to be developed eventually — for example, automatically picking `type: bug or feature` instead of leaving the raw conflict markers like the following— but that's not implemented yet.

<<<<< type:bug ==== type: feature >>>>>>

link