Hacker News new | ask | show | jobs
by kbenson 4814 days ago
> Oh please. Nobody's ever done such a comparison across more than 2 or 3 languages or frameworks. Have you even seen the benchmarks in question?

Okay, let me clarify. When I say "you really need to go this far" I mean that stopping at any point before that (but after the simple metric of how many requests a second it can serve which they already do) makes no sense, IMHO. If you are going to compare frameworks and you want to go beyond that initial performance metric, you might as well aim high enough to be useful.

I agree you never see anything approaching that in other reviews/benchmarks. Is that a good reason to not try it here?

> Why not just go do it, if it's so essential and straightforward?

I'm envisioning this as a community process, not a "Go off and write this in 20 frameworks or you're useless" sort or ultimatum. As such, just speccing out a possibly route is helping.

Also, I plan to help with the existing benchmarks. After the second round, I pointed them out to the author of my favorite framework in the hopes he would have time to put together something for the benchmark, otherwise I was going to in the next week or two when I had time. I still plan to.

Oh, and that framework author's answer? That these benchmarks are laughable because all they measure is performance, and there's a clear performance to convenience trade-off shown in the results, and that of course there's a performance hit when the framework handles most the work for you. I have to say I agree. Sure, there's possibly some that are clear winners giving good performance with lots of conveniences for common operations, but is there any way to tell as much from the data presented so far?

1 comments

> I mean that stopping at any point before that (but after the simple metric of how many requests a second it can serve which they already do) makes no sense, IMHO. If you are going to compare frameworks and you want to go beyond that initial performance metric, you might as well aim high enough to be useful.

The query 20 random rows, build an object, and return that, is way beyond requests per second. Enough that the list is dramatically re-ordered.

> That these benchmarks are laughable because all they measure is performance, and there's a clear performance to convenience trade-off shown in the results, and that of course there's a performance hit when the framework handles most the work for you. I have to say I agree.

On the contrary, performance versus magic is absolutely not a "given" here. Yes, some bare languages are near the top, but there are also heavy frameworks (e.g. Spring) performing well, and lean frameworks performing poorly.

As for being "laughable", that makes me suspicious of the framework author's understanding of where optimization needs to happen. Presumably, pages will be run more often than they are authored. Presumably there's a recurring bill for the server farm. Optimizing for performance helps end users stay happy, and helps the company stay in business able to continue employing developers.

> possibly some that are clear winners giving good performance with lots of conveniences for common operations, but is there any way to tell as much from the data presented so far?

I agree with that point. I would like to see 3 additional columns added to the results: LOC, number of src files/templates across number of directories, and number of libs.

This helps suss out your point: how much does a coder have to type in this framework, and how much incidental complexity (files, libs) do they have to wrap their heads around?

Consider:

    [300 loc; 15 files  2 dirs;  8 libs]
    [650 loc;  1 file   1 dir;   0 libs]
    [150 loc; 15 files 15 dirs; 23 libs]
Multiply by 10 to imagine real world code, and these would each feel very different to an author, and to a new hire hired to maintain a project already in production.
To your last point, Terretta, my colleagues and I have had lengthy (but unfortunately inconclusive) discussions about how to represent the efficiency dimension that we expect many readers would like to visualize.

The easier dimension--performance--became our first goal and the source code in Github is expected to very loosely fill the role of answering questions of efficiency. But we know that's a barely serviceable solution to the challenge, and that is especially true as pull requests increase the number of frameworks.

The challenge of representing efficiency succinctly remains.

We have considered lines of code and I think that for all its weaknesses, that is the best proxy for efficiency that I am aware of. Nevertheless, I'd like to contend with some of the following specific issues:

1. Many frameworks create boilerplate when you create a new application. Do we count those LOC?

2. Many frameworks on dynamic languages copy their entire corpus of functionality as source files into the application's root. Certainly we don't count those. Check out our Github repo's colored language bar at the top right. :)

3. Do build scripts count as LOC?

4. Do configuration files count as LOC?

Ultimately, I retain (irrational?) fear of contributing to enshrining LOC as a metric because unlike performance--where higher is always definitively better--lower LOC is not always definitively better.

I was asked separately if we had any data about how long it took us to build tests in the various frameworks. We didn't take detailed logs, but we do have rough numbers. Nevertheless, I've been reluctant to share how long it took us to implement the test code because it's a biased sample. Our previous experiences make us well disposed to some platforms and languages while we make silly time-consuming mistakes on others.

This is how I would answer these questions:

> 1. Many frameworks create boilerplate when you create a new application. Do we count those LOC?

No, as long as the lines did need to be changed. Any line you must alter (not just add code before/after) should be considered a line of code you had to write to get to a functioning implementation. You had to understand the line before altering, so could have written it entirely yourself.

As such, diffing against a reference boilerplate file is a good indication of lines required.

> 2. Many frameworks on dynamic languages copy their entire corpus of functionality as source files into the application's root. Certainly we don't count those. Check out our Github repo's colored language bar at the top right. :)

Take whatever you are given as a boilerplate starting point for a new project, and diff the final implementation tree against that. Special care may need to be given to implementation that default to installing third party libs into the local framework lib path (if any exist), so those are not counted.

> 3. Do build scripts count as LOC?

No, I would think not.

> 4. Do configuration files count as LOC?

This is a bit more complicated, but for simplicity's sake it may be easiest to treat it just like any other reference file as in #1 and #2.

> Ultimately, I retain (irrational?) fear of contributing to enshrining LOC as a metric because unlike performance--where higher is always definitively better--lower LOC is not always definitively better.

I agree, but without something better, it's what we have.

I do think there's ways we could get to a more useful metric using relative lines of code required to implement something between languages, and then compare relative LOC to host language, but that's way outside the scope of this benchmark, and requires information we don't have (to my satisfaction) yet.

Thanks for the thoughts here, kbenson.

I like the idea of using a diff be judge the total LOC. That reminds me that before we did the initial commit to Github, we had briefly entertained doing more or less that: we were going to commit the initial boilerplate for each framework as commit #1 and then the resulting tests as commit #2. To save on effort, we ultimately did not do that, but it is possible for us to go back to our original local Git repo to glean that information.

Using that approach, I think even build scripts could be included--that is, if you need to modify the build script and it shows up in a diff, then that line counts.

1. Many frameworks create boilerplate when you create a new application. Do we count those LOC?

Yes if it's code someone could edit. Someone unfamiliar with the framework needs to read these lines of code. Boilerplate may be "pre-typed" for you, but it's part of "your" app, and it's optional. That's not the same as a lib or the framework itself. You could start your app differently.

The new hire has to deal with these LOC regardless of who typed them (you or a wizard), so they're significant.

2. Many frameworks on dynamic languages copy their entire corpus of functionality as source files into the application's root. Certainly we don't count those. Check out our Github repo's colored language bar at the top right.

Agree this should not be counted. Only what's part of the app in question. Put another way, if those directories could be "hidden" from the new hire, and he could do his job, don't count them.

3. Do build scripts count as LOC?

No.

4. Do configuration files count as LOC?

Yes. The Spring framework's pom.xml or any dependency injection "configuration" file is crucial, so yes. On configs, though, I'd guess kbenson's idea about diff is reasonable.

- - -

However, about diff on #1 above, I'm concerned that with diff you're really just measuring how different the test is from the boilerplate. That says more about the test than about the framework. Someone could also optimize their framework's pre-typed boilerplate to match your test. (As we see on browser rendering benchmarks, for example.)

Really, if it's code that's typed (by the framework or by you) into an application, it should be counted.

I'm coming to this from the standpoint of hiring a new developer onto a project. More time is spent maintaining projects than jumpstarting them. At some point really soon after jumpstarting the app, boilerplate and edited code are going to be intermingled and indistinguishable. The new guy has to wrap his head around it all. So I would like any code typed by you or for you to be counted.

And beyond LOC, it matters how those lines are distributed. If you have to open five nested files to find out what one hello world is doing, even if each file only has a single 3 line function in it, that's complexity that matters. Number of files and number of directories both make a framework feel very different.

Most benchmarks are for weekend MVPers. Your benchmarks are moving into "real world" territory. In the real world, being able to wrap your head around someone else's existing project matters. LOC, files, directories, and dependencies, factor into that heavily.

> The new hire has to deal with these LOC regardless of who typed them (you or a wizard), so they're significant.

That's true. Maybe a two numbers, one with and one without boilerplate? That way someone could get an idea of what they are looking at. (I know, I'm just stacking more and more work up...)

> As for being "laughable", that makes me suspicious of the framework author's understanding of where optimization needs to happen. Presumably, pages will be run more often than they are authored. Presumably there's a recurring bill for the server farm. Optimizing for performance helps end users stay happy, and helps the company stay in business able to continue employing developers.

Well, where optimization needs to happen is highly dependent on the business, and where that business is within it's lifecycle. Sometimes a higher server farm bill is preferable to some impediment to the developers, because the developers don't scale as quickly. Personally I would much rather throw twice as much hardware at something than to work twice as long (at least initially), but obviously it's not as cut and dry as that.

> The query 20 random rows, build an object, and return that, is way beyond requests per second. Enough that the list is dramatically re-ordered.

> This helps suss out your point: how much does a coder have to type in this framework, and how much incidental complexity (files, libs) do they have to wrap their heads around?

The reason I suggested a blog is to also exercise whatever templating system the framework ships with, if any. Otherwise, what the standard templating system is for the target language.

True, this also could be tested just by making something up and displaying those random 20 rows in some manner, but at this point, why not just use a simple spec for a blog? I think it's almost the same amount of work, and IMO will result in more consistent implementations.

That said, I think we are one the same page, more or less.

Edit: s/work twice as hard/work twice as long/ because it maps more closely to what I meant to express.