Hacker News new | ask | show | jobs
by michaelt 1989 days ago
I did a test for you just now. I have 100Mbps internet, 32GB RAM, 4ghz i7 processor and suchlike. To make it easy for Jira, I'm doing this at a weekend, late at night, during the new years holiday so the servers shouldn't be busy.

On a cloud-based classic software project (which has less than 200 issues) opening a link to an issue it takes 4.8 seconds for the page to complete rendering and the progress bar at the top of the screen to disappear.

Opening a Kanban board with 11 issues displayed? 4.2 seconds for the page to load.

Click an issue on the board? 2.5 seconds for the details to pop up.

Close that task details modal - literally just closing a window? 200 milliseconds. Not to load a page - just to close a modal!

In case I'm being hard on cloud Jira by insisting on using a classic project, I also checked with a 'Next-gen software project' with less than 2000 issues.

I click a link to view a particular comment on an issue. 4.8 seconds until the issue, comment and buttons have all loaded.

I choose to view a board? 9.9 seconds from entering the URL to the page load completing.

I'm viewing the board and I want to view a single issue's page. I click the issue and the details modal pops up - and just as I click on the link to the details, the link moves because the epic details have loaded, and been put to the left of the link I was going for, causing me to click the wrong thing. So this slow loading is a nontrivial usability problem.

View a single issue, then click the projects dropdown menu. The time, to display a drop-down menu with three items? 200 milliseconds.

This is what people mean when they say the performance problems are everywhere - viewing issues, viewing boards, viewing comments, opening dropdowns, closing modals? It's all slow.

And if you imagine a backlog grooming meeting that involves a lot of switching back and forth between pages and updating tickets? You get to wait through a great many of these several-second pageloads.

3 comments

See, the irony of this is that you are just publicly sharing performance numbers which undeniably show a pattern of performance issues. It also doesn't seem to be possible without you first accepting ToS.

Ooops!

What are they doing to do? Shut down your instance and force you to switch to a different product....? Hmmmm
Certainly there are providers who would immediately begin license renegotiation with the thread of termination. It's bad business in the modern era because somebody will just tweet out the renegotiation terms and the licensors don't want to be streisanded.
> Certainly there are providers who would immediately begin license renegotiation with the thread of termination

Oracle comes to mind.

who says this is an "issue" its just numbers. If you think its an issue thats your interpretation. For instance I used jira for communicate with my team about 3 projects and it only took me 3 hours.

Maybe this person is writing a fiction story where the protagonist is using Jira and they are detailing how they spend their day.

Its like a John Steinbeck novel

> who says this is an "issue" its just numbers. If you think its an issue thats your interpretation.

No, this is a quote from the comment:

"This is what people mean when they say the performance problems are everywhere - viewing issues, viewing boards, viewing comments, opening dropdowns, closing modals? It's all slow."

May I point you to the title of this submission?

"Atlassian Cloud ToS section 3.3(I) prohibits discussing performance issues"

No sane judge would agree with your interpretation.

The actual text says that you can't "publicly disseminate information regarding the performance of the Cloud Products". So no interpretation required; posting the stats is enough.
No sane judge would accept that that is a valid clause in a ToS.
Are you allowed to say “use another app”? Or no?
Hi michaelt,

Thank you for the numbers -> I agree these are slow, and I can guarantee you that the Jira team is working on it (though I can't talk about details). These numbers are definitely outside of the goals.

I appreciate the call out of "page to complete rendering and the progress bar at the top of the screen to disappear" and "until the issue, comment and buttons have all loaded". In a dream world of course, everything would load in < 1s (everything drawn, everything interactive), but working our way down to that will take time.

We're currently looking at each use case to understand the '(a) paint faster vs (b) interactive faster' tradeoff and trying to decide which cases the user has a better experience with (a) or (b). In Confluence this is clearer in some places than in others, but in Jira it's less clear I think (I work on Confluence, I probably shouldn't speak for Jira specifics).

It always comes down to a limitation of resources though, which is why we're always hoping to get as specific feedback as possible.

> In a dream world of course, everything would load in < 1s

It's important you understand that "everything loading in <1s" would still be unacceptably slow - that is still an order of magnitude too slow.

That is not "a dream world" - not even close. A well built tool like this, meeting standard expectations (i.e. table stakes), would hit <50ms for the end user - the vast majority of the time. A "dream world" would be more like 10ms.

You should be targeting <200ms for 99% of user-facing interactions. That is the baseline standard/minimum expected.

This is why people are saying the company needs to make a major shift on this - you're not just out of the ballpark of table stakes here, you're barely in the same county!

It cannot be overstated how far off the mark you are here. There's a fundamental missetting of expectations and understanding of what is acceptable.

Do you have evidence that what you're asking for is possible? I'd be interested to see websites that hit the benchmark that you're aiming for.

I just tested a HN profile page (famously one of the lightest weight non-static websites) and it takes between 300ms and 600ms to load. I'm not saying that Jira can't improve, but if HN isn't hitting 250ms then I think telling the Jira guys that nothing less than <200ms is the minimum standard is unrealistic.

Look at github pull requests. It loads in under 200ms for me. And is vastly more complex than HN, both in sense of queries and UI, content should be equivalent of what Jira needs.

Jira is also much more interactive than HN. You are sitting 10+ people in a room with some half asleep scrum master opening the wrong issue, have to go back and open the correct one, search again for some related issue you though was fixed last month. Refresh the board to make sure you didnt forget to fill in one field so it ends up in the wrong column, etc etc.

1 sec per click in a situation like this is a joke, and that's just their goal. Reality is 4sec+ as OP mentioned, often even more.

200ms for user interactions is different to a 200ms page load.

A 200ms page load is incredibly fast.

Still, I tested your profile page on Google PageSpeed and it came out at a 300ms load time.

https://developers.google.com/speed/pagespeed/insights/?url=...

Assuming the FE resources are already cached on the user's machine, with careful optimisation, doing all of the rendering/fetching on the FE over a single connection, and with everything parallelised, it definitely is possible to load a new page well under 100ms with the key content being displayed.

When taking that kind of approach, you don't have to wait for the slowest thing to come in - eg with a normal BE render, you might need to pull up the user's profile and settings, A/B testing flags, the current footer config or whatever.

eg if you're on the page for viewing a single ticket, you can request the ticket data immediately, and render it as soon as it's available - even if other parts of the page aren't finished yet. True it may be more like 200-300ms to have the entire thing be 100% complete, but all parts of the page are not of equal importance and holding up the main content while loading the rest isn't necessary.

If you are doing a full BE render, it's still totally possible to hit that 100ms mark, but indeed dramatically more difficult.

Hi BillinghamJ,

You're right, I apologize for not being clear. We're targeting 1s for "Initial loads" on new tabs/new navigation, which I assume you're referring to. Our target for 'transitions' is different.

If however the numbers you're referring to are "initial load" numbers, then I'm not sure.

(edit: and action responses again are also a separate category. Our largest number of complaints are about 'page load times' in Confluence, so most conversations center around that)

Initial loads should definitely be be <100ms as well.

But Jira currently is so slow that 1s would be a great improvement. I am using it at work and regret it, unfortunately.

As a first step, 1s would be better than nothing for sure, but you need to be working towards a much tighter goal on a 1-2 year timeframe.

New load, you should really be hitting 200ms as your 95th percentile - 300ms or so would be decent still. "Transitions" should hit 100ms 95th, 150ms would be decent.

If you did hit 100ms across the board, you'd be rewarded by your customers/users psychologically considering the interactions as being effectively instantaneous. So it really is worth setting a super high bar as your target here (esp given you need a bit of breathing room for future changes too).

Thank you for coming back and clarifying. Do you happen to have links to any public testing results of other tools, or guidance to this specificity - would love to use them to build a case internally

Most of what we've seen online are nowhere near this level of detail (X-ms for Y-%ile for Z-type of load)

(edit: clarified request)

I'm afraid I'm no expert on project management tools!

On what users experience as effectively "instantaneous", that's from experience on UX engineering and industry standards - https://www.nngroup.com/articles/response-times-3-important-...

On the other noted times, they're just a general range of what can be expected from a reasonably well-built tool of this nature. Obviously much simpler systems should be drastically faster, but project management tools do tend to be processing quite a bit of data and so do involve _some_ amount of inherent "weight", but that isn't an excuse for very poor perf.

That said, I imagine if your PMs do some research and go ahead and try using some of the common project management tools, you should get a good idea. ;) Keep in mind speeds to Australia (assuming Atlassian is operated mostly there?) will likely show them in a much worse light than typical perf experienced in the US/UK/EU areas.

The time to first load is derived from the fact that you're running essentially the equivalent of many "transition" type interactions, but they should be run almost entirely in parallel, so roughly 2x between "transition" and "new load" is a reasonable allowance.

Not to be a jerk, but you guys don’t allow others to take your performance metrics, but you’re publicly soliciting performance data from other products at the same time? I’m assuming you’re taking it for granted they don’t have a ToS that bans you from doing this.

Sorry if that’s pointed, but it’s sort of meant to be incredulous (but hopefully not offensive).

Do you guys use synthetic monitoring tools?
no dont give into this guy ... this is done over the net. The rate of transfer has to be taken into account. Unacceptable is a measure of comparison.

Unacceptable to who, you have a faster provider for cheaper, with as many features???

Im pretty sure he doesnt because if he could he would go there. There are tradeoffs and Atlassian has many project they are working on. They understand that there is room for improvement in performance. Its one of Atlassian's priorities, it is a tech company (a pretty good one I would say).

I guess one question is about server redundancy. Where is this guy loading from and where is the server he is loading from? Getting things below 1s is nearing the speed of the connection itself. Also at that speed there is deminishing returns. Something that happens at 1s vs .5s doesnt make you twice as fast when you dont even have the response time to move your mouse and click on the next item in .5s.

Sometimes techies just love to argue. You are doing great Atlassian and have tons of features. But maybe it is time to revisit and refactor some of your older tools.

You've shown poor understanding here.

> Getting things below 1s is nearing the speed of the connection itself

That is absolutely false. Internet latency is actually very low - even e.g. Paris to NZ is only about 270ms RTT, and you _do not_ need multiple full round trips to the application server for an encrypted connection - on the modern internet, connections are held open, and initial TLS termination is done at local PoPs.

For services like this - as they are sharded with customer tenancy - are usually located at least in the same vague area as the customer (e.g. within North America, Western Europe, APAC etc).

For most users of things like Atlassian products, that typically results in a base networking latency of <30ms, often even <10ms in good conditions.

Really well engineered products can even operate in multiple regions at once - offering that sort of latency globally.

> Im pretty sure he doesnt because if he could he would go there

Yeah, we don't use any Atlassian products - partly for this reason. We use many Atlassian-comparable tools which have the featureset we want and which are drastically faster.

> when you dont even have the response time to move your mouse and click on the next item in .5s.

There is clear documented understanding of how UX is affected by with various levels of latency - https://www.nngroup.com/articles/response-times-3-important-...

> Sometimes techies just love to argue

Not really, I have no particular investment in this - I don't use any Atlassian product, nor do I plan to even if they make massive perf improvements.

But I do have an objective grasp - for tools like this - of what's possible, what good looks like, and what user expectations look like.

> no dont give into this guy

I don't expect Atlassian is going to make any major decisions entirely based on my feedback here, but it is useful data/input for exploration, and I do feel it's right to point out that they're looking in the wrong ballpark when it comes to the scale of improvement needed.

To put things in perspective, the typical Jira 5-second page load time as reported by many people in this forum is equivalent to twice the round-trip time for light to the Moon!

It's the network latency equivalent of a million kilometres of fibre!

The internet is fast. Computers are fast. One second is enough time for my machine to download 10M data points and render them into an interactive plot.

https://leeoniya.github.io/uPlot/bench/uPlot-10M.html

In my mind, anyone doing UI development and seeing user interactions taking over 1 second should be asking themselves "did the user just try to operate on more than 10^6 of something?" and if the answer is no, start operating under the assumption that they've made a mistake.

> A "dream world" would be more like 10ms.

A gateway solely adds 50 ms. So I'm not really sure where you get your numbers/benchmarks from... They are unrealistic

What gateways have you been using?! That's a long, long way off on the modern internet. Assuming you mean gateways as in the lines you'd see on a traceroute, more typical might be ~2-5ms on a home router, ~0.5-1.0ms upstream.
Lol, wasn't expecting that :p

Ocelot would be a better example of an gateway https://github.com/ThreeMammals/Ocelot

Used for scaling up web traffic or creating bff's ( backends for frontends)

Ah nice, I didn't realise you meant application proxies/gateways. Network ones are so quick due to their ASICs etc!

I personally would still say 50ms is super, super slow for an application gateway - a well designed one using e.g. nginx/openresty, lambda@edge, or simply writing another application server etc can easily do that job with an addition of <0.1ms processing time (assuming no additional network calls or heavy work), and maybe 0.3ms for additional connection establishment if it hasn't been optimised to use persistent connections.

If it is e.g. making a DB request to check auth, I would highlight that this _is_ backend processing time, not inherent or unoptimisable overhead. e.g. it's totally feasible to do auth checks without making any async calls, just need a bit of crypto and to allocate some memory for tracking revoked tokens - does add a bit of complexity, but likely worth it for the super hot path.

BFFs would not really need to add anything beyond ~1ms or so, but you do hit the lowest common denominator - in that you have to wait for the slowest thing to complete, even if everything is happening in parallel.

BFFs definitely benefit in simplifying client-side code, but at the downside of increased overall latency and potentially resilience which could be achieved by decoupling unrelated components.

As such, I wouldn't expect the Atlassian products to use BFF patterns - for them it's better to throw 1k requests down a single HTTP 2/3 connection and render each part of the page when it's available. I have heard their FEs are very complex, which I think would probably support that assessment.

I hate to pile on a thread where you're already taking a lot of flack, but this point is really important to the future of Atlassian:

> In a dream world of course, everything would load in < 1s (everything drawn, everything interactive),

As a contractor, I have more or less walked out of or refused interviews on discovering Atlassian toolset was in use. It's not because I hate your tooling (it is visually nice and very featureful), it's because the culture that delivered this software is antithetical to anything I look for in a software project I want to use or contribute to. How can I possibly do my job to any degree of satisfaction when I'm tracking work in a tool that requires 15 seconds between mouse clicks? That is the reality of Jira, and as a result I refuse to use it, or work for people who find that acceptable, because it's a "broken window" that tells me much more about the target environment than merely something about suboptimal bug trackers.

Your page budget should be 100ms max, given all your tools actually do are track a couple of text fields in a pleasing style. Whoever the architecture astronauts are at Atlassian that created the current mess, flush them out, no seat is too senior -- this is an existential issue for your business.

Hmmm. I mean. I'm a contractor too, and I share your pain, but ... I'm really impressed you walk out of paid work because of the issue tracker your client uses.. It sounds a bit like they dodged a bigger bullet than you did tbh mate.

All these systems suck. You learn to live with them, for me I do this:

Everything goes in OmniFocus, I have a keyboard shortcut to create a task that takes <1sec, hit enter twice and it's stored. Twice a day I go though all the tasks I entered this way, and I either mark them done or assign them to various projects/tags/labels I have setup on OmniFocus.

15 mins before I finish work for the day at a client, I update whatever ticket system they use (mostly Jira, but also sometimes even worse things like servicenow) and also whatever enterprise crapware my agency uses (usually some sap based bollocks).

The last 15 mins suck. But it's part of the deal. I can't imagine how strongly you feel to turn down contractor rates due to a ticket system.. I mean, come on?

Edit: Also - btw -- if you're on a Mac the app-store 'fat-app' version of Jira is about 10x better than using the web interface, I suggest you give it a try.

If you turned up to interview, or even worse, arrive at a client site, and they hand you a mouldy 80386 to work from, and point you to the basement, would you feel comfortable?

Jira is the mouldy 80386, and the client's culture is that basement where such things belong. I can't see how this is even being precious. I can find solid work on good teams with smart people anywhere, there is no reason I need to work in a basement permanently damaging my lungs.

Lame analogy, I know, but it's close enough.

> I'm really impressed you walk out of paid work because of the issue tracker your client uses.. It sounds a bit like they dodged a bigger bullet than you did tbh mate.

> I can't imagine how strongly you feel to turn down contractor rates due to a ticket system.. I mean, come on?

It may be the case that they are in such high demand they have practically free choice of work. That's how I interpreted it, at least.

You’re tooling seems...impressive? Assuming you had 2 projects that paid the same why in the WORLD would you eat 1 to 1.5 hours of that a week? Seems soul crushing and demotivating, but, props to you for not being a fair weather sailor and just getting it done. Actually kinda cool how resourceful your solution is.
> Your page budget should be 100ms max, given all your tools actually do are track a couple of text fields in a pleasing style

Yeah although it doesn't exactly help in figuring out how to resolve, I think this can be a good grounding in what the product fundamentals actually are and figuring out which over-engineering of those fundamentals is translating into speed problems

I often feel that product people view this type of problem in the wrong way - when you're starting at 5-10s, little incremental A/B tested tweaks are not going to get you down to 50-100ms. A 100x diff requires you to rethink from first principles - it's impossible to get there otherwise

Of course this is also why incumbents get disrupted by startups!

Hopefully this demonstrates that the anti-performance-discussion ToS clause is harmful not only to your customers but to you as well. You're getting useful information here only because some people are willing to openly violate it.
Not to mention the reputational damage from people asking "why the hell is this in the contract in the first place?"

It says they're so afraid of the quality of their product they'd rather litigate their customers than fix their product.

I wonder if these should be called "Streisand clauses", because it seems that the net effect will be for people to increasingly associate Atlassian with badly performing software.

Certainly if someone asked me what I know about Atlassian, this would now be one of the first things that come to mind.

At the margin some one reading this thread is much more likely to hop of Atlassian and short the stock than than they are to become a new user with one of those shiny high net promoter scores and “land and expand” wallet shares they brag about in their investor relations materials.
> In a dream world of course, everything would load in < 1s (everything drawn, everything interactive), but working our way down to that will take time.

FWIW our on-prem install uses <1s for opening issues, running a search etc. Too bad that's a dream world you've decided should no longer be...

I’m relatively sure it’s the lack of several host to host hops in the network requests that makes an on-prem install so much faster. The way Atlassian’s hosted services handle requests is mind-bogglingly awful and necessitates several round trips per request a lot of the time. It’s just poor architecture on their end.

We’re talking 3-5 redirects for some things they could just proxy on their backend. It’s dumb and there’s no amount of hardware or bandwidth a client can throw at the problem to fix it.

This should be quite glaring in any performance metrics collected though, shouldn't it?

I mean, I don't do web stuff (yet) but I can't imagine it's that difficult to figure out where several seconds get spent.

It’s possible that Atlassian work culture requires getting permission to grant permission to a subordinate to grant permission to their subordinate to do some work, contingent on a report of the quantifiable metrics that will be reported periodically to be compiled into other periodical reports that no one will care enough to read.

I’m only sort of joking here, since it can be weirdly difficult to actually just do the job you’re being paid to do in some orgs. I once got paid handsomely to deliver almost nothing for six months since all the layers above me were busy either talking back and forth or not even caring at all. It bored me so much that I had to leave, but the money was great.

There weren’t even any disappointed customers because they just allocated budget and forgot about the project. It wasn’t their own money they were spending, after all.

Being an engineer is weird sometimes.

"'(a) paint faster vs (b) interactive faster'"

It is only a tradeoff if you're at the Pareto optimality frontier [1] for those two things.

I seriously doubt that you are. You should absolutely be able to have more of both.

I would recommend to you personally two things: Open the debugger, and load a page with an issue on it in any environment. Look at the timeline of incoming resources, not just for how long the total takes but also all the other times. You will learn a lot if you haven't done this yet. It will be much more informative than anything we can tell you.

Second, once an issue is loaded, right click on almost anything in the page (description, title, whatever) and select "Inspect Element". Look at how many layers deep you are in the HTML.

I also find it useful to Save the Web Page (Complete) once it's all done rendering, then load it from disk with the network tab loaded in the debugger. It can give a quick & dirty read on how much time it takes just to render the page, separate from all network and server-side code issues.

I have a bit of a pet theory that a lot of modern slowdown on the web is simply how much of the web is literally dozens and dozens of DOM layers deep in containers that are all technically resizeable (even though it is always going to have one element in it, or could be fixed in some other simple way), so the browser layout engine is stressed to the limit because of all the nested O(n) & O(n log n) stuff going on. (It must not be a true O(n^2) because our pages would never load at all, but even the well-optimized browser engines can just be drowned in nodes.) I don't have enough front-end experience to be sure, but both times I took a static snapshot of a page for some local UI I had access to that was straight-up 2+ seconds to render from disk, I was able to go in and just start slicing away at the tags to get a page that was virtually identical (not quite, but close enough) that rendered in a small fraction of a second, just with HTML changes.

My guess is that fixing the network issues will be a nightmare, because the 5 Whys analysis probably lands you at Conway's Law around #4 or #5. But, assuming you also have a client-side rendering issue (I don't use JIRA Cloud (yet) but I can vouch that the server product does), you may be able to get some traction just by peeling away an engineer to take a snapshot of the page and see what it takes to produce a page that looks (nearly) identical but renders more quickly. That will not itself be a "solution" but it'll probably provide a lot of insight.

[1]: https://news.ycombinator.com/item?id=22889975

I guess I meant that in a more general sense: prioritization is always about tradeoffs, and sometimes you're improving one (paint faster) or improving the other (interactive faster), sometimes both, sometimes trading off one versus the other.

We have looked into the network issues and some of it is similar to what you stated, we do have a known minimum given our chosen cloud infrastructure (separate from our software performance) - we obviously recognize we're not at that limit yet either though.

I have not tried what you mentioned above (load from disk), but I will give it a shot -> it may also give us a clue on how to make our performance testing lower variance, come to think of it......

(apologies for slow replies, HN is still throttling me due to downvotes)

That tallies with the experience I had using Jira cloud a few years ago. It sounds like it's still a great case study in how not to architect an issue tracker.