Hacker News new | ask | show | jobs
by SebastianFish 1790 days ago
Testing different methods of development in terms of speed, cost and quality is really hard. The most convincing approach to me would be a single blind experiment to hire two software development teams and have them build to the same set of requirements in two different ways. But then it is hard to know whether you are really comparing the method of software development or the quality of the software teams. So two software teams isn't enough to get a statistically valid inference. You can see that, given software development rates, this could become a very expensive experiment.

Last point. I think that even writing a specification down to the level that it could be implemented using formal methods might be the biggest game changer. Agile stories rarely come even close to covering all of the potential edge cases. If we had a process that required product owners to literally think through all possible failure modes (what systems of formal methods do) and write out how to handle them then the cost of writing specifications would go way up. Per economics, I think we would end-up with simpler specifications which might be its own benefit.

4 comments

It’s not that hard is it? There recently was a study on the value of testing and best practices linked here on HN, that I of course can’t find now, where the researchers looked a thousands of projects. Over all there was no scientific proof that testing and best practices lead to better results than just making spaghetti without a recipe.

Having worked in an public enterprise organisation that buys a lot of different software for some decades, it sort of fits with our completely anecdotal data. We still prefer suppliers that have all the nice buzzwords, but if you look at our projects there is just no correlation between their methods and how the software project goes through its livecycle. And this is with everything from the old COBOL systems to modern micro service this and that cloud solutions.

For the past five years I’ve had a near little sidejob as an external censor for CS students, and it’s been interesting to follow how their software design metrology changes rather rapidly, without any real scientific reason as to why that is. Mostly it seems like there is an entire field of “education” dedicated to getting people to do software design and project management in the way that sells the most licenses to Atlassian or whatever else, or simply the most books. It’s really very comparable to the self-health industry, where you’ll have answers for everything.

Sure it’s mostly anecdotal, but preaching that test driven development or going Agile SOLID wild turkey is the holy grail is exactly the same as preaching some diet where you get to eat as much you want as long as it isn’t carbs. Sure you can lose weight, but it’s not like it’s the only way to lose weight and next year it’ll be about going on a juice cleansing or something and then something different after that.

A study design like that is called an epidemiological study and is far behind the gold standard of a random controlled trial, reason being the teams that choose to do testing or not are not randomly assigned to experimental and control groups. There are ways outside of study design of controlling for confounders when you can't randomly assign experimental and control groups, such as in this instance only looking at teams that directly tried similar projects with tests and without, but it is rare to see anyone do that.

Otherwise, you hit a rather obvious issue. Testing and following best practices are not the only policies impacting project quality, and in particular they exist in large part to help less experienced or hastily assembled teams. If you're comparing their output to the output of several core maintainers who have been working on the same project for 20 years, in the absence of other information, you expect the latter group to produce better quality work, and the fact that they actually do even if they aren't following industry best practices doesn't tell you those practices aren't useful to the former group or even that the latter group couldn't have produced an even better product if they'd followed them.

Be aware I'm not at all trying to advocate for either approach, just the issues with various flavors of scientific evidence that vary tremendously in how valuable they are depending on study design. I'm just saying we can't know with any level of scientific validity because the studies themselves are near worthless. Software management is in the state today that major league baseball was in 30 years ago, no statistically valid evidence and a whole lot of gut eye test from grizzled veterans. But unlike with baseball, nobody is keeping rich troves of every imaginable counting stat that can be counted going back a century on all of the developers, so a pure data science approach to making management more scientific like the moneyball guys accomplished in pro sports is not likely to work, since it would necessarily be data science without the data.

> Over all there was no scientific proof that testing and best practices lead to better results than just making spaghetti without a recipe.

Which best practices? I can’t see how a team without certain practices could be that effective. Eg, version control, having good backups, a good communication culture, code review, config management to prevent “it works in my machine” problems, and many other things.

For code review specifically, I read a paper years ago claiming a dramatic decrease in defect counts in companies who practice this

"A specification that can be implemented using formal methods". That is just source code. If the specification can completely define arbitrary programs it is necessarily Turing complete on its own, and as such prone to the same type of bugs as any other program.
Purely from an industrial perspective, interest in formal methods tends to split two ways:

1. Verifying really nasty algorithms, the kind you see in cryptography and embedded systems and stuff where the bugs are triggered by horrific race conditions or incredibly specific malicious inputs that even experts won't think of testing

2. High level specifications of requirements and abstract machines and stuff, where the spec is like 100 lines and the implementation is 10k and you'd prefer to catch some design bugs before you're ten sprints into coding

A lot of bugs in (1) end up being memory-related, which is why you're seeing languages with borrow checking as part of the semantics (Rust). A lot of hype these days is in (2) because it's a lot cheaper and easier to learn, at the cost of having a lower power ceiling.

What is rust most commonly used for? If Somebody learned to program with it, what sorts of projects would they work on?
I think you are speaking to one of the core tensions in formal methods. The difference between a specification and an implementation can get blurry. Where formal methods get interesting is statically proving properties about the specification. Take a simple example of a sorting algorithm. The two most commonly proved properties of these algorithms are that they 1) return a permutation of the input list (no items removed or duplicated) and 2) that the output of the list follow some sort of ordering.

One way to look at things is to say the permutation and ordering property checkers are the specification and the actual sorting algorithm is the implementation.

To your point about the specifications being Turing complete, some tools will put restrictions on the specifications to make function termination highly likely. COQ for instance requires that recursive functions be "decreasing in their inputs" AKA that subsequent calls to the same function are passed fewer items or elements than the parent.

Sorting is one of the more favorable tasks for being specified this way, for much code there is no simpler way to verify the output than running the same or equivalent code again.

If your specification language is not Turing complete then there is simply stuff you cannot specify. Of course, just because it isn't Turing complete doesn't mean it isn't perfectly adequate for writing bugs.

As soon as you need to interface with arbitrary external components, you see the value in good specifications. If JPEG-2000 was just a reference implementation and not a spec, that would work fine if only one team ever had to develop a JPEG library, and every application that read or wrote JPEG files used exactly that one library.

Since that isn't the case, having something sit at a higher level of abstraction than the actual source code is quite valuable. Additionally, it allows domain experts like image scientists and physicists, who are experts in how to compress and decompress data with minimal quality loss but may not be experts in any particular programming language, to still contribute to the spec.

I understand your last point, but my question is:

In order for domain experts to contribute to a formal specification, this specification must be, well, formal, and also serve as a very precise shared language among all domain experts and implementors (i.e. the people who are going to read the spec and build something out of it). Once you go down this road, the specification language becomes as complex as any programming language -- or maybe even more! -- and must be learned by all involved, just like any given programming language. Some people will find it easier to learn, some will struggle or find it bizarrely unfamiliar -- again, just like any given programming language. Any sufficiently expressive specification language will also be subject to the kinds of bugs and complexity that affect programming languages.

So my question is: isn't learning a shared formal specification language more or less as difficult as learning an unfamiliar programming language?

Is there a machine-readable spec for JPEGs? Did it find interesting bugs and oversights?
> The most convincing approach to me would be a single blind experiment

http://www.plat-forms.org/

Requirements analysis is an integral part of the development process. Giving teams fully specified requirements at the beginning of the experiment wouldn't be realistic.
Depends. You could then run experiments on team performance in the requirements-to-code phase, separate from the generating-requirements phase. That has its place. And then you could experiment with teams trying to convert informal requirements to formal requirements. That might let you learn some things about the parts that you couldn't learn if you dealt with the whole.
Those are not separate phases, they take place simultaneously. No one does waterfall development anymore.
Waterfall development is still pervasive in a lot of industries and companies.
Definitely still out there.