Hacker News new | ask | show | jobs
by l0b0 1029 days ago
This is also an opportunity to think about the value of a piece of configuration. If an example configuration value works for 40% of users without modification, should that value even exist? Think Bash's `HISTCONTROL=erasedups`, which shouldn't be necessary to set in the 21st century. Or should it be auto-generated, like Docker's container names?

In the very best case, the defaults are so good that an empty configuration does what most people want. Think ripgrep, …, welp, I can't really think of many good examples. Browsers need extensions, Bash needs a decent prompt, even many pro cameras need to be configured to save raw images by default.

4 comments

> Think Bash's `HISTCONTROL=erasedups`, which shouldn't be necessary to set in the 21st century

As in that it should be the default? Or only option? I personally do not have (and want to have) that set, and I am hardly alone. Any change in defaults fucks someone over, especially in things like bash, where you ssh into many machines with many different versions of bash...

Omg, I just read the documentation for erasedups and I'm actually shocked anyone would want that feature turned on, to the point where it never even would have occurred to me to implement it in the first place and if the feature worked like that without some way to turn it off I'd have been super angry :(. Maybe we are parsing that sentence wrong and by "shouldn't be necessary to set" the idea is "it is useful in situations where you have limited disk space / memory but in the 21st century no one would need to set this so it might as well no longer be supported"?
That option has nothing to do with limited disk space.

Eliminating the duplicates makes it much easier to search the history for a command that was used long ago instead of having to skip over hundreds of duplicates of some non-interesting command, such as "ls".

It also makes it much more likely that complex commands used a long time are still preserved. No matter how large you make your history file, it is much more likely that it will become filled with simple commands that you do not need to recall from history, instead of keeping the complex commands that you hate to retype.

Moreover, while having a command history is useful to avoid retyping some commands, some may be less willing to preserve a history from which it is easy to discover which have been their exact actions while using the computer, though for this it is preferable to also disable the saving of the history file.

    $ cat ~/.inputrc 
    "\e[A": history-search-backward
    "\e[B": history-search-forward
This means that I can type the first couple of letters of a command, and then use the up and down arrow keys to cycle through history to search the rest of it. It makes life simpler for me, and confuses me only on the occasions when I've used `git stash`, so `git status` doesn't appear when I expect it to.
fish shell does this by default, and also autocompletes commands from your history. I highly recommend it.
Ctrl-O is incredibly useful. Go back to an earlier command, press Ctrl-O repeatedly, and you re-run a series of commands. That doesn't work if some of them were deleted as duplicates.
Now I'm confused. Why would you want duplicate entries in your command history? I get that on an overloaded system in the 80s it might have taken a perceptable amount of time to filter out duplicates. But on a modern system you surely want it always enabled?
I think the people who leave it on see it as "rewriting history" -- e.g., if you typed [cmd1], [cmd1], [cmd2], and you erase duplicates, your history no longer reflects what you actually typed. The people who leave it off see the history as more like "a list of interesting things I did at some point", so erasing duplicates means it's easier to find the most interesting things.
Any search of the form "what other interesting commands did I need the last time I did <foobar>?" benefits from being able to see the historical context. Or if I know the particular incantation is one I wrote several times and is very similar to some others then it's easy to search the history for the blocks of popular commands. Or whatever. It's your terminal, but given how fuzzy the interconnects in a mind can be it's hard to know what info might be useful down the road.

To your point about computing power, I think it's only now that it's reasonable to not filter the duplicates. If I want to emulate the dedup behavior it's trivial nowadays to blindly read 1M lines of history and dedup on the fly each time. Using downstream tools doing duplicate filtering/transforming work is similarly very very fast. Disk is cheap enough that I really do want to keep every command I ever write in my history. More expensive compute and disk would make me more likely to turn that feature on, not less.

I don't use bash so it's not applicable to me anyway, but why would I want it on? It's saving a few bytes of disk space, at the cost of ruining my history (the thing that the whole feature is about). At least fish's search feature is so awesome that I never had a problem with duplicate commands causing problems.

And I want my history untouched, because sometimes I forget how I solved something, go back a few months in my shell history and see the sequence of commands that I've used. I think that's a nice thing to have. And if I ever need history without duplicates, it should be easy to deduplicate it with a simple script.

I use ignoreboth -- it's not about saving disk and memory, it's about making the history more useful for searching. The ignorespace also helps avoid getting credentials saved to disk you don't want saved and is even more useful these days than it was in the past because we're no longer on multiuser systems where you really want to keep credentials out of commandlines entirely.
I have the HIST_IGNORE_ALL_DUPS options set on zsh. I only need a history of what commands I have run, the order isn't important to me. Having the history be a unique set of commands makes it easier to dig through.
How do we combine evolution and development with backwards compatiblity? I think it's quite natural that we end up with this conundrum. Like say for example Vim having outdated defaults, because changing them could disrupt existing users. A reboot/fork of it can reset and start fresh but will eventually in its own development hit the same problem.

Are there examples of projects who solve this well?

Ripgrep's author is also very careful about breaking changes - I think that means it will also one day have outdated defaults!

A reasonable compromise would be to allow changing defaults whenever there's a major version bump.

ripgrep seems to have a major version bump every 18 months or so, which seems a bit excessive. OTOH vim went through 9 versions in 30-something years which seems more reasonable. Although I think there was more churn early on. ISTR vim 6 being around for a long time.

Does vim follow semver? (vim predates semver...) So how do you know you're comparing apples-to-apples?

Look at the breaking changes in each ripgrep major release. I don't use major releases as a means of breaking popular workflows. I use major releases even when there are very small breaking changes with minimal impact.

> Does vim follow semver?

I don't think vim follows semver, although a lot of old unix software, and also Free Software, would use major version bumps to indicate compatibility breaks. Semver was in many ways the documentation of what a lot of software was already kind-of doing.

But vim was used as an example by the GP, so re-using it as an example to show what I thought was a sensible major-release schedule (instead of, say, coreutils, or glibc, or glib/gtk, or perl, or Qt, etc...) seemed appropriate.

> I don't use major releases as a means of breaking popular workflows. I use major releases even when there are very small breaking changes with minimal impact.

A breaking change is going to break someone's workflow. (See also, Hyrum's Law, xkcd 1172.)

As a user and developer, my preference is for breaking changes to be put off as long as possible, and then all applied together every few years or so. (Or never :-) That way, I don't normally have to pay that much attention to updates, even if an app gets new features. But when there's a major version bump, I can check the release notes carefully and know to keep an eye out for anything unusual/different.

If I only ever ran one program, it wouldn't be a big deal. But I don't, I run hundreds regularly. If they all have breaking changes once per year, that averages out to me needing to check over two sets of changes that might impact me every week.

Did you do what I asked and look at ripgrep's changelog?[1] The breaking changes are prominently advertised in each major release. Not all breaking changes are the same or have the same impact. Some major releases don't even have any breaking changes. (semver doesn't say to only do a major release when there's a breaking change. You can do a major release without breaking changes.)

I don't think you're correct about "old Unix software" using major version bumps to indicate compatibility breaks. Recent 3.x releases of GNU grep, for example, fucked around with the meaning of \d when using the -P flag. With no changes to the major version number. Did that break your scripts?

The thing about semver is that it tends to make breaking changes much more visible, which is kind of the point. And of course, when you compare it to projects that don't use semver and don't increment the major version for every breaking change, the projects using semver look like they're moving at a much faster pace. It might be true, but you can't conclude it by looking at version numbers when the projects aren't using the same versioning scheme.

[1]: https://github.com/BurntSushi/ripgrep/blob/master/CHANGELOG....

Another example of old timey Unix code just breaking things in minor point releases. See https://abi-laboratory.pro/index.php?view=changelog&l=glibc&... and https://github.com/intel/hyperscan/issues/359.
Open source projects are terrible at this, in general. Any feature ever introduced, no matter how ephemeral or small an audience, can't ever be removed. Which I suspect has contributed to most GNU tools being kitchen sinks rather than doing one thing well.
> In the very best case, the defaults are so good that an empty configuration does what most people want.

More generally, defaults (including, default examples), matter: https://news.ycombinator.com/item?id=25646180

Especially for software which is mod-able, malleable, composable, configurable... defaults have a disproportionate impact on UX, DevEx etc. A reason why TLS 1.3 got rid of a laundry list of options, or why WireGuard is simply a joy to work with (as opposed to IPsec / OpenVPN), or why middleboxes on the Internet are a big hurdle to protocol upgrades, or how NewCloud companies like Cloudflare, Replit, Flyio, and Vercel have devex beyond what the Big 3 can muster up.

> Think ripgrep ... welp, I can't really think of many good examples.

Apple has got this spot on, across decades. Their products "just work", as they say so in their own marketing.

> Browsers need extensions

Now you see why Chrome is defaulting to Manifest v3 ;)

> This is also an opportunity to think about the value of a piece of configuration. If an example configuration value works for 40% of users without modification, should that value even exist?

This sounds completely insane. If a majority of people need something different, they shouldn't be allowed to have it?

I read it slightly differently: those 60% can set one of the other values and the default value no longer exists but is simply the default so the 40% who previously had to set that value now don't.