Hacker News new | ask | show | jobs
by untog 3827 days ago
I don't understand why this is still an issue in 2015. Why can't our source control software take code indented any way, any how, and convert it to your personal preference when you pull the repo? It's not like we enforce what color syntax highlighting you use or what font.

Honestly, the hours that have been wasted debating tedium like this in programming...

8 comments

> Why can't our source control software take code indented any way, any how, and convert it to your personal preference when you pull the repo? It's not like we enforce what color syntax highlighting you use or what font.

Because the mapping from one whitespace style to another is not a bijection. Moreover, our source control tools rely on things like content hashes which change when even a single bit of the source has changed, and sometimes even our language parsers depend on whitespace format.

Frustrating, I know, but you will not see a resolution to this problem with automated tools alone. You will, however, when the Tab key is removed from keyboards altogether :)

Wouldn't this feature be similar to how Git converts line endings?

https://help.github.com/articles/dealing-with-line-endings/

It would probably have to be configured per repo so that it doesn't mess with tab delimited files etc.

Auto CrLf is a terrible thing that can change the meaning of programs. Source control shouldn't be changing the contents of files (though a hook to format wouldn't be bad necessarily).
Auto CRLF is awesome for languages I use. Which ones can it change meaning in?
Any language that allows string literals to wrap lines without normalizing it. Like C# or F#. Anything with HERE docs might do it, but Ruby and Python seem to normalize.
But I have seen a solution to this problem, back in subversion days. With help from perl-tidy and php-tidy.

Pre commit hooks, combined with an appropriate *-tidy program for your language. Code is formatted according to the repo tidy config with a pre-commit hook and formatted according to the developers preference on checkout.

I'm missing why whitespace style mapping is not a bijection, you do have to accept some limits to code style in terms of completely rigid indentation rules, and it is obviously language specific. But it seems to be a bijection to me, what am I missing?

I believe mapping is 1:1 when only the initial indentation is used with tab/space, and further indentation is done with spaces.
The proper way to do that would be for everyone to use tabs for indentation. The authors of golang seem to have taken this view.
Go actually uses a mix of tabs and spaces in gofmt, tabs for baseline indentation and spaces after that for vertical alignment.
This is fantastic, and is how I'm picturing the IDE of the future would work internally, even if it lets people store the files with a variable number of spaces for indentation and does the transformation when reading and writing the file.
it's objectively the best way to indent, and yet so rare ;(
I'm not sure if you're trolling, but obviously not everyone would agree that this is the "proper way."
I'm not. The obvious way to fix it would be to add a token signifying indenting one level. Why not choose the tab character as the token?
Obvious depends on your constraints.

Suppose (1) some of your newlines are indented but some are aligned for readability, and (2) mixing whitespace has a tendency to create subtle bugs.

Under those constraints, the only possible choice is spaces. Tabs can indent, but only spaces can either indent or align. Process of elimination forces you to prefer spaces over tabs in that situation.

This might be why PEP 0008 urges, "Spaces are the preferred indentation method." https://www.python.org/dev/peps/pep-0008/ [Specifically, four spaces.]

I had heard Google urged two spaces for indentation of Python, per some old style guide, but not sure if that's still followed or not.

Google's current Python style guide absolutely calls for two spaces. All of the languages I'm familiar with (at Google) do this. Except golang ...
... its almost as if it was entered into the character set for that precise purpose even. XD
The problem with this is that if you have added tab indenting to line up with a certain character in the line above, the length of the tab character would affect where the text on the next line is actually placed, depending on individual users' preferences. Spaces are uniform and standard length.

I believe this is the only real argument against tabs. I'd prefer to use tabs because that's exactly what they were designed for, however this use case is fairly prevalent in programming so it can't be ignored.

tabs to indent, spaces to align.

this is what go does, and it works great. take any gofmt-formatted style, set your tab size to anything you like, and it will still look great.

it's baffling and slightly tragic that this style of indentation is not more popular!

Mere humans are incapable of doing this right as most don't run with show whitespace (so tabs and spaces look different) & many don't grok the difference between indentation and alignment.

I wonder if gofmt logic can be extended to other languages.

Well, in my opinion doing any sort of aligning-things by hand is pretty tedious. I tend to just use indentation and not worry about alignment in languages that don't have a formatter to do it for me.

E.g. I would write

    struct Foo {
        int bob;
        string alice;
    };
and not worry about lining up variables, whereas gofmt would give you

    type Foo struct {
        bob   int
        alice string
    }
which is fine too but not worth doing by hand IMO.

But if you are, it's not too hard to know where to use spaces and where to use tabs, even w/o show whitespace. But it is true that this is more deeply more than most programmers want to think about indentation. :)

There are more languages than Go and some of them work better with spaces.

Spaces work everywhere.

Nah, it works almost everywhere (ignoring things like nim or make that treat them differently); it's not just a Go thing. Here's a random page I found advocating the style for js and css:

    http://lea.verou.me/2012/01/why-tabs-are-clearly-superior/
And here's what codinghorror had to say about the style:

> This way, in theory at least, the level of indent can be adjusted dynamically without destroying alignment. But I'm more inclined to think of it as combining all the complexity and pitfalls of both approaches, myself.

Which sadly seems to be most people's reaction -- basically, it might be better, but I'll be damned if I'm going to have BOTH tabs and spaces in my files!

Oh well...

Spaces are obviously a workable solution, but there are two really big downsides: (1) it throws away information -- when should I insert or remove a indentation block of spaces? You can make your editor guess, but it won't always guess right, and you won't always be editing inside your properly configured editor. A tab simply says what it means and requires no editor trickiness. (2) The obvious inability to adjust the visual width of the indentation, though personally as long as it's not too crazy I'm happy with anything from 2-8, so this doesn't bother me. (I normally run with tabs at 4.)

It's pretty simple, tabs are clearly superior in theory, by a decent margin. And spaces are superior in practice, by an enormous margin.

Go is making a play to make tab/space mixing work nicely in practice and that's quite interesting to watch.

Sounds like an argument against spaces. If you indent using spaces, there is no easy way to change the indentation. But with tabs you can change that global width of the tab.
The authors of golang understand that tabs would be the right solution to this, but only if the entire toolchain makes the same choice.

Everyone else uses spaces because spaces are better than a mix of tabs and spaces and it's extraordinarily difficult to keep spaces out of your files.

It'll be an interesting experiment to see if there are any problems with that approach. If not, maybe we can all finally move on.
Notably, Go makes a few other decisions that help with this approach:

1) `go fmt` fixes up whitespace, and is fast enough to run on every file-save. Although this sometimes screws up my undo stack.

2) Spaces are still used for alignment, but...

3) Only tabs are used before the content of a line, and...

4) Only spaces are used throughout the content of a line.

The primary reason that I prefer spaces is that it means that I don't have to constantly look at hidden characters while editing. However, go fix's enforcement of the begin/intra-line separation of tabs and spaces alleviates that problem: If it's touching the left margin, it's a tab, otherwise, it's a space.

Doesn't work with comments that float to the right of the code (such as the ones you get with M-; in emacs). Along these lines:

    fred          /* Fred */
    fred(fred) {  /* Fred */
        fred      /* Fred */
    } fred {
        fred      /* Fred */
    }
(The reader must make up their own mind about whether this is important or not.)
I'm quite happy with my SCM keeping it's greasy fingers out of my source.
Makefiles need hard tabs. The hours that have been wasted on that insanity... (bonus points for useless error printing)
that kind of stuff is a nightmare in source control.

i'm always cringing when i see a text encoding option - what I want from source control, is, byte-for-byte, what I checked-in, regardless as to line ending issues that haven't caused a real world issue that i have ever seen... (even in the 90s...).

use windows line endings - they are the way they are for a reason - they work fine if interpreted as unix or mac style line endings. every modern text editor supports all three options (and even some crusty things like notepad or textpad do too).

as for tabs? i find that using any convention works fine in practice, and i stick with 4-spaces/tabs depending on the default configuration of my editor. there is no (good) reason to care about mixing them in the vast majority of languages/compiler environments.

Assume that github could do that for you. There would still be some based on tabs some on two sieves and some on 4. This was just looking at what people were using.
It's so stupid. And precisely the kind of bikeshedding silliness that persnickety programmers love to waste time with.
What you're talking about is tabs. Spaces are for people who like to enforce their viewing preferences on others.
Which (including the default for windows of 4 and for *nix of 8 spaces per tab) is probably why Python went for breaking everyone's stuff if they didn't use spaces...