Hacker News new | ask | show | jobs
by Propelloni 971 days ago
Of course this isn't and hasn't been true for quite some time. I'm the first to blast MS Word for being a total disaster (esp. templates, ie. style/substance separation, are bad) but it is no longer a locked-in platform. Even the docx format is only a zipped XML file. If you want, you can unpack the document file and put it into git. Thank you Open Document Foundation!

On top of that, all contemporary word processors I'm aware of have, of course, versioning with diffs. It is just different than git (or other programmer tools.) Just as you are using your tools of your trade and don't know much about MS Word, lawyers use their tools of their trade and don't know much about git. It's like saying that editing POs is superior to Trados, because for a programmer it is but a professional translator is going to tell you a different story.

(Of course, everybody everywhere should be using LaTeX for fine-looking documents in all circumstances. No argument here ;))

2 comments

My point is not that. Sure, you can go from Word to OpenOffice. Great, now you manually highlight your code in that..

It’s a deeper thing. You can hack Word and related tools for coding and eventually it is acceptable I guess, but it’s starting from the wrong foundation.

This ladder will never reach the moon.

Word’s diffs are not “just different”. they are objectively inferior in many ways. I personally witness daily the travesty of government staff’s handling of information.

Word is a fancy digital typewriter and IMO it’s the wrong abstraction for this day and age and cultural issues are the only thing keeping us back. As always.

Edit: academic papers looking like they were written on a 19th century typewriter.. I don’t get this fascination with style, from scientists of all people. Lay down the info, provide the data. Kerning your fonts properly.. oh my god, I need to cool down. I am a hot headed type of guy, sorry about that.

Hey, thanks for the reply. From what I read, I think you think that language use is some kind of coding with words where you have a deterministic relation between input and output. You seem to treat the semantic content of a statement as if it is somehow static, objective, and oberservable. I don't think it is and I'm in good company on that matter.

That being said, that's just my reading of your comment and I could be wrong, which is kind of my argument here. If I'm right lawyers don't care about your notion, they use language for something different than mere information encoding. Therefore they need tools that support their use case. MS Word (or word processors in general) might not be the best tool for that job, but it is good enough. Integrating a well trained ChatGPT into MS Word will help lawyers much more than any structured entry form ever could.

BTW, the LaTeX quip was intended to make light of the idea of separating content and style, which goes way back. Consider TeX' age. Your reaction tells me, you think LaTeX is a styling tool, which in a sense it is, and that's what it is about, which it is not. Hordes of scientists (and type-setting professionals) argue in favor of LaTeX (or other type-setting systems) because you just write the content in plain text. LaTeX takes care of the style. TeX files are also just markup and easily git'able. It does make life easier, but it is not as important as some people make it out to be.

Thanks for indulging me. I know I am yelling at the clouds.

I also know people usually misunderstand me because I am a “programmer” and all I see is “code”. I guess that’s fair enough, but I fully understand legal being of a completely different nature from Rust.

What I also understand is that no matter how long everyone argues about it, the only thing that matters about legal is the text. The font, the styling, etc is all secondary. It might be important, but it’ll never be primary. Unless courts start judging differently based on page margins I guess.

The same goes for science. Publishing “attention is all you need” in an 8bit NES font might not be fashionable, but it does not and cannot detract from the discovery within it. LaTex produces the exact same documents (I know it is configurable but we are going for a certain style) and that’s what this is about. Not how the tools work but that we fundamentally even care about it instead of focusing on the primary issues like correctness, openness, accessibility. I’d like academic papers to be APIs actually.

Again I see the importance of styling and appearance in general. It’s just that we start with that and I think that’s problematic and actively harms our progress.

Also, to conclude, I am nitwit. This is just my take.

Edit: A man can dream, right? If a paper was plaintext I could typeset it last minute in 8bit NES fonts if I’d be so inclined. I hate ya’ll deciding how everything looks and works. I know that’s technically challenging, but to me that’s where the progress is. An academic paper like, say, a jupyter notebook would be awesome, not? Would you give up your fancy type setting? I would!

If you are a nitwit, I'm one, too. Don't worry. I think I get your take. You say the important part of legal and scientific texts is their content, not their form. And I agree. But that is not where we started. We started with (paraphrasing) "programmer tools like git are superior to MS Word, therefore lawyers should use git." There, I disagree.
Oh, right. I agree as well. Git is not their kind of tool.
This 100%. It does get interesting when you get into non-plaintext things that have to somehow integrate into plaintext systems (git managed codebases). We've kind of left it up to CMS systems to handle the non-plaintext bits but this leads to many more orthogonal process problems.

IMO, I think it really comes down to finding a universal mechanism for diffing and 3-way merging things that aren't plain text (document diffing). I think distributed version control can be universal (at least on a data level), how an application renders a meaningful diff for a specific task is incredibly subjective to the document type and task at hand. My point being that I completely agree that plaintext makes a whole lot of sense for programmers and pretty much nobody else. However, distributed version control does not have to be confined to plaintext, it's just tricky to see when all the version control systems we're familiar with are plaintext ones.

Git is popular because it's linear, and the linear paradigm usually translates well to serial things such as programs, instructions, document sets, etc.

It's actually bad at non-linear stuff, which you will have noticed if you have ever been working with hierarchical formats, especially e.g. xml or nested JSON.

Word is bad for a whole litany of reasons, but the reason it can't be easily versioned (atop the format being a literal Goldberg machine requiring inane transforms to properly) is that it encodes a bunch of non-linear formatting instructions. Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.

For programmers, we are used to reducing things by their dimensions into fairly linear spaces, this then helps us reason fairly linearly about changes, but doing this from any other context is challenging. Lawyers e.g. perhaps focus on the relations between various clauses, so linearizing their document flow is not very important to them, at least when there exists methods to diff the general textual content without investing much in how they are doing that.

As programmers we see the similarities to editing a code base and that excites us, however we do have a tendency to go off and write frameworks to parse and simplify these things, without ever actually bothering to learn to apply these things. This is not invaluable, but it's a different focus, which maybe explains why lawyers are not in the habit of using git.

> Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.

We’re in complete agreement. But you can do this, you just need to provide a “renderer” and a schema that describes how your tree structure should merge or conflict. If you want to test out a weird version control for structured data, my email is in my bio.