Hacker News new | ask | show | jobs
by codingdave 2120 days ago
My SaaS deals primarily with legal documents that for years had been maintained with Word. The pain of emailing documents is real, but the comfort level with how Word works is also real. Over the years, most organizations have developed internal workflows to share and send documents around that bypass the pains, and while they may not be perfect, they work.

The funny thing is that the document authors like these ways of working. It is the tech people who don't. I've seen "Git for Word" proposed many times a year for a while now. And all of the ideas are interesting, but none of them appeal to my audience because they don't care about git's feature set. Nobody wants to branch and merge. Nobody wants a straight version history. ("Nobody" meaning nobody in my market, not nobody in the world.)

They want a storytelling experience. They want to know the why, not the what. And the workflow tends to be unidirectional, not with collaborative changes coming back together, but with expanding changes as each person adds their ideas and makes change for a specific instance of using a document. The experience we build for them bring in pieces of version history, pieces of comments, pieces of telling the story of why something was done, so people down the line can have more context to decide whether to accept or reject the changes.

It isn't that "Git for Word" is a bad idea - on the contrary, it would be great if someone pulls it off. My point is that building something that improves on Word isn't actually about the software, it is about the document workflows. If you find groups who work like software devs do, where documents receive small updates from a team, and bring all changes together for a final product, there is probably a market. But when evaluating such ideas, there has to be a reality check of whether the actual use of the documents truly matches the use case for git.

13 comments

As someone who is at the intersection of tech and arts one of the things I like about using git in projects is that it is very clear what is the latest official defintive final variant of a piece of data and you don't have to ask anybody to get it.

When I worked as a VFX freelancer I was amazed at the number of hours (=money) burned by marketing agencies who didn't manage to give me the definitive variant for a simple list of things they wanted. In one instance they gave me everything they had, including crude and unrecognisable filenames, hints about things that I should ignore via telephone etc. I had to make sense of it and compile a list which I sent them to approve. They ended up approving another list (!) which they themselves sent me two weeks prior and they only managed to correct this once I hinted at this.

Of course this is a example of saw qhow things should never be. This usually involves somebody getting sick and some uninformed person taking over etc. But what I learned on film sets is that you should choose the defaults of your communication culture in such a way, that it works under the absolute worst conditions (bad weather, hungry, stressed, confused, etc).

And I have seen so many organisations fail at precisely that. If you get I'll someone else should be able to take over without heading to an oracle. This is not a special function limited to a version control workflow, it is something that has to do with clear communication.

Using git can sometimes help avoiding the whole problem by making it obvious which file is the latest and which is a variant of it, the people using it will have to use clear communication as well (e.g. by writing good commit messages, choosing the "right" commit sizes, naming things the right way etc). So if you know how to use git, you just might value clear communications a little bit more than the average person.

> As someone who is at the intersection of tech and arts one of the things I like about using git in projects is that it is very clear what is the latest official defintive final variant of a piece of data and you don't have to ask anybody to get it.

As git is a distributed system I think it’s not at all clear what the definitive final variant might be —- and that is a strength.

That can be handled externally to git via ad hoc convention, say by using a system like gitlab or github and letting it declare one as “primary”, or by having someone post to a mailing list (“Commit X on a repo you can reach at URI Y is the official release”) both of which are common.

But in your example various people could mail you commits and not have any consensus on which is authoritative.

Git Tags can substantially address this concern.
you should choose the defaults of your communication culture in such a way, that it works under the absolute worst conditions

The defaults are sensible. Throw money at it and pay someone enough to sort things out and get it done, e.g. you as a freelancer get a data dump and ask the right question and the problem is solved. Sure it costs money. But everything costs money.

Git works great among peers. But most organizations are hierarchical. And the boss doesn't have to give a shit about which draft is the latest because the boss is the boss.

> And the boss doesn't have to give a shit about which draft is the latest because the boss is the boss.

As a boss myself I have to say: I totally give a shit. Salaries are my company’s #1 expense by a wide margin. I don’t want my staff spending their time manually merging docs received via email when there are much better solutions out there. I hire people because they are smart and can get shit done that makes money, not because I want servants.

This is the killer app for Office 365 and google docs: stop wasting time emailing shit around, one canonical version even outside of company walls.

most offices are an immense source of waste regarding information exchange.

some would say whoever solves that problems is filthy rich

Having a lot of experience in both legal (academic and lawyer) and tech (developer and founder), I totally agree with you and think you framed very well the situation.

As a lawyer I can full confirm that our industry works as you have described (as regards documents workflows), and with my tech background I can also confirm that most features of dev-oriented solutions like git are mostly uninteresting from a lawyer's perspective.

Similarly, I am a qualified lawyer. I previously worked in a big law firm in London, which describes itself internally as both the “best” and “the most advanced” law firm in the world. I now work full-time as an engineer in software, and increasingly in hardware.

I agree with both comments.

To add, in large-scale corporate/commercial practice (which is the area which I practised), Git would be useful in replacing email-based collaboration, but the switching costs seem too high.

Currently, the corporate law contract negotiation workflow is as follows:

1. a party adds their tracked changes to a Word document based on a template contract;

2. the party emails this document to party B;

3. party B reads the changes, may discuss the changes with their client, adds their tracked changes, and then emails the updated document to party A.

This process repeats for every document, punctuated by occasional conference calls between the parties, until the parties agree.

‘Git for law’ would be useful for lawyers in increasing efficiency - and thus reducing costs for clients.

However, the benefits for law firms of adopting a new Git-based workflow are likely to seem relatively small to lawyers. Their current email-based version control system is messy and time-inefficient, but generally functions with minimal error.

On this basis, I would predict that most corporate law firms would be very slow to adopt a Git-based system - the benefits may not justify the costs.

One should also note that lawyers, particularly contract/commercial lawyers, are conservative by profession. In my experience, most lawyers are very slow to adopt new technologies, highly risk-averse, and skilled at spotting risks. The combination of these traits means that any technology will have to offer a very high benefit to replace an existing legal workflow.

I am still working in big law and I disagree that the benefits of adopting more advanced version control would be small. However, I work in a field that is a mix of regulatory law and litigation, where almost no work is based on templates and most things are drafted from scratch.

One very large problem that typically comes up in large teams: Only 1 team member can edit the the "live version" of a document (it's locked for editing by the version control system), the other team members need to work "offline" and then reintegrate their changes/drafts into the main document. Everybody has lived through the horror story of a team member in the different time zone still having checked out the live version and going to sleep :)

Sometimes you have to circulate a draft document in parallel to multiple parties (e.g. colleagues with special subject expertise + client's inhouse lawyers + client's technical experts + other party's law firm + other party's inhouse lawyers + other party's technical experts). It can happen that you need to reintegrate comments from different parties to different versions of the drafts, e.g., if your client gives feedback quickly and you re-circulate updated version internally, then you receive other party's comments to the older draft version...

Besides the mechanical aspects of reintegrating comments, it is also difficult to track if everybody who needs to sign off has actually signed off on the parts of the documents they had to review. Often it gets lost who made which comment/change. It can be quite awkward if a regulator asks you "Isn't the technical statement on page 12 contradicted by fact XYZ" - please explain until tomorrow - and you have to quickly figure out who actually put that in...

Would you also email me so I can ask a few questions?

m@replace-with-my-username.com

Would be much appreciated!

Actually what is so time inefficient about this workflow? Sending by email doesnt take a minute. Also it is a great way to transfer responsibility (in whose park is the ball).
There are many flows that don't work. You can only have a single person working at the file at a time. You need a mutex (which email is in this case). However imagine that you send back a file, then realize that you missed something. You can add more changes and send it again, but if the other party has started working already now they have no reasonable way to merge in the new changes.

Also is there anything that actually guarantees that the tracked changes were the only changes made? I haven't seen this but it seems like a serious flaw in the process.

Also what if you get an intern to do some of the work, then you want to review the changes between version:$lastyousaw and version:$current. IIUC the mail with tracked changes only allows you to view one "patch" at a time.

To argue for this workflow: the mutex is soft (social) which is a good thing, sometimes ppl do not reply on time and you need to move forward. For having new ideas/paragraphs after sending out, you either inform the colleague to hold it until your done, or you send a paragraph and ask him to merge it.

The mail allows you to see all patches, sometimes 'clean slate' is done by accepting all changes. While this sounds like a problem in theory, in practise its not.

I agree that things can be better somehow, but it is really difficult to see any solution which is at least 10% better. The current workflow also has the advantages: - data is as safe as your filesystem and email system together - Word file is generally not considered a vendor lock-in - everybody understands the workflow - nobody can block the workflow (like not checking in again with sharepoint)

> Git would be useful in replacing email-based collaboration, but the switching costs seem too high.

Git was originally designed on an email based workflow for software development (hence the commands am, format-patch and send-email).

For contract negotiation, if the template contract was in plain text, then it could be emailed as a patch. The party would then apply that contract to their local git repository, make the changes and email the diff from the original template back to the first party.

So essentially, you could still use email, but have the diff between changes as the content in those emails (along with inline comments).

Unfortunately, corporate Outlook/O365 based email systems don't work very well when used in that fashion.

Tom, would also love to ask you a few questions via email.

m@replace-with-my-username.com

Sure thing. Emailed.
I am not a lawyer, but as an entrepreneur I've had to send-and-receive a lot of legal documents with investor's lawyers, almost exclusively in .docx.

I never trust the received file's "track changes", always compare to the latest version I've sent -- and it is extremely common to find a change that wasn't mentioned/discussed, and somehow magically "accepted" or otherwise not tracked in the other side's "track changes". Whenever I point these out, I always got a "oh, yes, forgot about that one", or "I didn't intend to put that in" or "I'm not sure why it didn't appear in the track-changes view" -- but out of tens of these (with multiple lawyers over multiple years), not one was ever in my favor.

Branching might not be as interesting on a single project - but diffing is, very much; and I'm sure it's not more coveted mostly because most lawyers either (a) don't realize how good it makes life for you when you can diff and blame easily, or (b) are abusing the fact that it is so hard to diff/blame on documents, and certainly (c) usually charge by the hour, so some efficiencies are actually going to cost them money if they implement them (a famous Upton Sinclair quote comes to mind).

You are right, this happens a lot when drafting documents out-of-court documents.

IMHO this is a useful "feature" for lawyers. Don't forget that usually lawyers of two parties are working"together" only apparently, when in fact they are always litigating for their client's best interest.

The goal is not to reach a common agreement, but to reach the agreement that best serves the interest of one's client, most of the times at the expenses of the other party.

This is achieved in many ways, one being having text in a contract that the other party is not fully aware of, either because it's not properly understood or noticed.

This means that including in a document text without the other party noticing is a good old trick that quite valuable to any lawyer.

As a lawyer my position on this is that it's the other party to blame if it did not check the document properly (I always compare the documents for differences even when sent with revisions).

> (c) usually charge by the hour, so some efficiencies are actually going to cost them money if they implement them

I tend to disagree with this line of thought. Lawyers have a thousand ways to inflate their timesheets. Using a tool that makes their life more miserable by forcing them to do manual work that could be automated is certainly not one of them.

> As a lawyer my position on this is that it's the other party to blame if it did not check the document properly (I always compare the documents for differences even when sent with revisions).

As a non-lawyer, this is why they say "the problem with lawyers is that 95% of them give the rest of them a bad name". And as I mentioned, I also always compare.

> Lawyers have a thousand ways to inflate their timesheets. Using a tool that makes their life more miserable by forcing them to do manual work that could be automated is certainly not one of them.

I agree, and they do inflate them regularly -- all lawyers I asked to draft NDAs and employment agreements for me charged a few hours worth for the first one "because they had to write it" even though it was unchanged from another client (for sure; I've seen that exact one before).

Still, they need to keep an air of "being busy" and "working hard", and the best way to do that is to occasionally work hard.

ghego, could I ask you a few questions via email?

m@replace-with-my-username.com

Sure, I've sent you an email :-)
In 2013, we were hacking away a document tracking system to solve exactly this. We thought we were disrupting the legal market while in reality the lawyers were way too comfortable with Word and emailing docx files.

Exactly like you've hinted, the right way to crack this is to bring a full-fledged word processor like Google Docs, but instead of ad-hoc realtime collaborations the software has to enable customizable unidirectional document workflows with controlled collaboration.

Most serious document creators don't want to branch and merge, instead they want to pass on the document through a series of stages. They want statistics on when, what and why of each stage. And at any point of time the document is in one definitive stage not scattered across emails/folders/versions/forks.

Yup. Even Google Docs supports this types of approvals workflow (in beta):

https://support.google.com/a/answer/9381067?hl=en

Unlike its normal collaboration mode, the file gets locked down.

This looks great!
I didn't understand the "customizable unidirectional document workflow". What does it mean?
Interesting perspective. But I wouldn't say I want to use git because I want to branch and merge. I would prefer all history to be linear -- it's just not always possible. My main draw toward git is just for keeping track of past versions along with comments describing the changes (that's what I see as "the story" as you put it). It's nice to know I have merging tools available to help me if I get stuck in such a situation, but I would prefer to never have to merge anything.
I agree with this being about workflows, not documents, but isn't branching and merging a workflow and storytelling tool?

It allows multiple people to work in parallel (and in private). When somebody sends a pull-request eventually, they are presenting a story of changes that they want to get into the document and people can discuss them and approve them individually. (Of course, git the tool isn't necessarily suitable for non-technical people, but git-the-workflow seems to be a good foundation.)

Could you elaborate on what such a tool could look like without git style branching?

I think the issue is that parallel workflow is a bridge too far for the current legal profession. They do sequential and they like it. A tool that makes the sequential workflow better will gain more ground than one that tries to change the process whole hog.
I have worked on several hundred of these types of contract processes in the last few years, and you are absolutely correct: sequential is where it's at for these situations. I have, however, encountered a few situations where time was an issue so I had two or three different versions out to different parties at the same time, and then merged the proposed changes where possible and sent out alternate versions where the changes differed. That process was... not fun, and could definitely use a more coherent workflow than manual merges or Word's built in merge features.
The anecdotes told here suggest that most think their work is sequential, while because a lot can happen asynchronously, hell situations happen all the time. Do you agree?
But the workflow only "looks" sequential, isn't it? In many stories told here a user may have inadvertently revert clauses to old versions and this can be missed. This happens because it is also an asynchronous work.
Personally, I'd be tempted to write Git for Word as a plugin that Just Worked. Users are given a new interface (The Document Repository?) from whence to select documents. Every auto-save is a commit. Enforce "explaining why" by ... what, requiring an in-document comment near the changes? A popup asking for explanation, and that's added as a commit message?

I don't know, I'm just spitballing. Sounds like it'd be fun for awhile to attempt to seamlessly get this into the workflow and see how it's accepted.

Commits of a version control have text that explain why, when and who.

I think it's more about the user interface. The user interface of Git is essentially what programmers already do - code.

Exactly. Git for Word doesn't make sense because software applications can be edited independently because of abstractions -- you can't have different people changing different parts of a Word document without ANY idea of what each other is doing the way you can in software. You can't change the tone in one section to conversational in isolation.
Typical workflow with word documents required mutiple people reviewing, commenting,and merging or rejecting the comments. Microsoft Word's change tracking,comment, and review features are adequate for most people.
The big problem with git for word is that git is not designed for the population in general. Using git is not easy, even for developers. Although powerful and flexible, it has a very complicated workflow that is too close to its implementation. In my opinion it is just like wishing that the general population use LaTeX instead of Word.
> It is the tech people who don't.

It is _their clients_ who don't, not just tech people. I hired lawyers a few times. IMO their redline and email workflow is error-prone craziness that could use improvement. That said, I'm a "tech" person, so I might be biased.

Hear hear!

I've spent many years in 'collaborative writing' in R&D, mainly grant proposals and joint reports/deliverables, most in the CS/IT domains. Writing those texts is very different from writing the software.

First thing you should realize is there are no 'tests', and all the 'code' is usually in a single big file. Anyone that has touched the document can have potentially messed up everything, both content, layouts and meta-data, and there is no automatic way to check whether it still makes sense. Many times people will not use the agreed upon editor/version, and sometime (often) that means a boatload of minor edits to the document all over the place just from opening and saving. Imagine everyone in your software team using different editors all with their preferred coding conventions that are automatically applied to the whole project at load.

From this you can deduce the enormous responsibility of ownership and gate-keeping in the workflow. The absolute worst collaborations I have been part of were those that somehow believed that if they used a collaborative document editing facility, wikis or Google docs for instance, that would negate the need for assigned owners/editors. Those tug-of-war shitstorms got exponential the closer one came to the submission deadline (technically incorrect, i know, but you know what I mean).

Some tips:

- Have well defined ownership for each section or part of your document. The owner receives and makes all changes for that part.

- have a final editor that is responsible for the complete document receiving the changes of the parts from their owners only.

- Do not trust 'track changes', but use Word's built in document compare if you are the final editor. For complex formatted documents (nearly all instances require you use an insanely styled template, you 'clean room' import (C/P through notepad) the text changes into the correctly formatted doc under your control.

- release the current trunk document often, ideally once per day. This requires staggering, with subeditors closing submission windows and submitting their updates to the main editor before EoB. Everyone editing should work against the latest release.

-Every version published by the final editor should be immutable. Mail it to everyone if needed, but if you use a link to some sort of repository make sure it is a deep link to a version that can not be updated in the repository, or hilarity will ensue.

- use versioning in the filename. filename_YYYYMMDD_HHMM_dXXX_rNN.docx where XXX is the assigned party acronym for the person making the update. 'YYYYMMDD_HHMM' is only touched by the editor, 'dXXX_rNN' is the NN'ed changes release by part XXX against version YYYYMMDD_HHMM .

Most certainly Git can function as a repository, but there will be people that will not work with it (nor any other repository) so always assume mail interactions as well.

Finally, there should be a special place in hell for the people that designed SharePoint versioning. Don't even think of going there.

nice story (really), makes me wonder if a special kind of group merge to give a better idea of who/why on the changes at a particular time would be interesting