Given the scope and breadth involved in these benchmarks, that's a helluva tall order. I'm sure nothing's stopping anybody from doing it themselves though.
I think that this could be specced out in stages, and implemented an a number of rounds. First would be a schema for a blog, with authors, posts an comments. Next would be a rest API for posts and comments. Finally mock pages to be used for posting, reading and commenting in HTML to test the included templating system, if there is one.
You really need to go at least this far. This will also give you an approximate code size for thus sample project as well, which is is at least as important as performance to some people.
What I mean is, the framework doesn't know if you're building the result of 20 queries into a blog post page that pulled in related data from the post itself, the author profile, and the comments and commenter profiles, or if you're pulling in arbitrary data. So there's no reason to test a "blog". Most of us aren't building blogs. But we are interested in querying databases, calling web services, cached performance, and async process queue handling.
Except that I, and I'm sure many other people, are interested in more than just performance. I want to know how much code it is to achieve some small subset of usefulness, and what it looks like. Is it overly complex? Is it split apart in a paradigm that doesn't match my mental model very well?
I agree most of us aren't building blogs (I'm not), but I believe a blog is a reasonable stand in for a more complex application. It obviously won't test everything, but the requirements are well understood (or can be well understood, if defined well enough).
Also, who's to say that some of these frameworks aren't going to perform significantly worse when they start having to do more than simply serialize data as JSON across a socket? With that in mind, how accurate are some of these benchmarks if they aren't set up and used how they would be in real life.
> Oh please. Nobody's ever done such a comparison across more than 2 or 3 languages or frameworks. Have you even seen the benchmarks in question?
Okay, let me clarify. When I say "you really need to go this far" I mean that stopping at any point before that (but after the simple metric of how many requests a second it can serve which they already do) makes no sense, IMHO. If you are going to compare frameworks and you want to go beyond that initial performance metric, you might as well aim high enough to be useful.
I agree you never see anything approaching that in other reviews/benchmarks. Is that a good reason to not try it here?
> Why not just go do it, if it's so essential and straightforward?
I'm envisioning this as a community process, not a "Go off and write this in 20 frameworks or you're useless" sort or ultimatum. As such, just speccing out a possibly route is helping.
Also, I plan to help with the existing benchmarks. After the second round, I pointed them out to the author of my favorite framework in the hopes he would have time to put together something for the benchmark, otherwise I was going to in the next week or two when I had time. I still plan to.
Oh, and that framework author's answer? That these benchmarks are laughable because all they measure is performance, and there's a clear performance to convenience trade-off shown in the results, and that of course there's a performance hit when the framework handles most the work for you. I have to say I agree. Sure, there's possibly some that are clear winners giving good performance with lots of conveniences for common operations, but is there any way to tell as much from the data presented so far?
> I mean that stopping at any point before that (but after the simple metric of how many requests a second it can serve which they already do) makes no sense, IMHO. If you are going to compare frameworks and you want to go beyond that initial performance metric, you might as well aim high enough to be useful.
The query 20 random rows, build an object, and return that, is way beyond requests per second. Enough that the list is dramatically re-ordered.
> That these benchmarks are laughable because all they measure is performance, and there's a clear performance to convenience trade-off shown in the results, and that of course there's a performance hit when the framework handles most the work for you. I have to say I agree.
On the contrary, performance versus magic is absolutely not a "given" here. Yes, some bare languages are near the top, but there are also heavy frameworks (e.g. Spring) performing well, and lean frameworks performing poorly.
As for being "laughable", that makes me suspicious of the framework author's understanding of where optimization needs to happen. Presumably, pages will be run more often than they are authored. Presumably there's a recurring bill for the server farm. Optimizing for performance helps end users stay happy, and helps the company stay in business able to continue employing developers.
> possibly some that are clear winners giving good performance with lots of conveniences for common operations, but is there any way to tell as much from the data presented so far?
I agree with that point. I would like to see 3 additional columns added to the results: LOC, number of src files/templates across number of directories, and number of libs.
This helps suss out your point: how much does a coder have to type in this framework, and how much incidental complexity (files, libs) do they have to wrap their heads around?
Multiply by 10 to imagine real world code, and these would each feel very different to an author, and to a new hire hired to maintain a project already in production.
To your last point, Terretta, my colleagues and I have had lengthy (but unfortunately inconclusive) discussions about how to represent the efficiency dimension that we expect many readers would like to visualize.
The easier dimension--performance--became our first goal and the source code in Github is expected to very loosely fill the role of answering questions of efficiency. But we know that's a barely serviceable solution to the challenge, and that is especially true as pull requests increase the number of frameworks.
The challenge of representing efficiency succinctly remains.
We have considered lines of code and I think that for all its weaknesses, that is the best proxy for efficiency that I am aware of. Nevertheless, I'd like to contend with some of the following specific issues:
1. Many frameworks create boilerplate when you create a new application. Do we count those LOC?
2. Many frameworks on dynamic languages copy their entire corpus of functionality as source files into the application's root. Certainly we don't count those. Check out our Github repo's colored language bar at the top right. :)
3. Do build scripts count as LOC?
4. Do configuration files count as LOC?
Ultimately, I retain (irrational?) fear of contributing to enshrining LOC as a metric because unlike performance--where higher is always definitively better--lower LOC is not always definitively better.
I was asked separately if we had any data about how long it took us to build tests in the various frameworks. We didn't take detailed logs, but we do have rough numbers. Nevertheless, I've been reluctant to share how long it took us to implement the test code because it's a biased sample. Our previous experiences make us well disposed to some platforms and languages while we make silly time-consuming mistakes on others.
> As for being "laughable", that makes me suspicious of the framework author's understanding of where optimization needs to happen. Presumably, pages will be run more often than they are authored. Presumably there's a recurring bill for the server farm. Optimizing for performance helps end users stay happy, and helps the company stay in business able to continue employing developers.
Well, where optimization needs to happen is highly dependent on the business, and where that business is within it's lifecycle. Sometimes a higher server farm bill is preferable to some impediment to the developers, because the developers don't scale as quickly. Personally I would much rather throw twice as much hardware at something than to work twice as long (at least initially), but obviously it's not as cut and dry as that.
> The query 20 random rows, build an object, and return that, is way beyond requests per second. Enough that the list is dramatically re-ordered.
> This helps suss out your point: how much does a coder have to type in this framework, and how much incidental complexity (files, libs) do they have to wrap their heads around?
The reason I suggested a blog is to also exercise whatever templating system the framework ships with, if any. Otherwise, what the standard templating system is for the target language.
True, this also could be tested just by making something up and displaying those random 20 rows in some manner, but at this point, why not just use a simple spec for a blog? I think it's almost the same amount of work, and IMO will result in more consistent implementations.
That said, I think we are one the same page, more or less.
Edit: s/work twice as hard/work twice as long/ because it maps more closely to what I meant to express.