Hacker News new | ask | show | jobs
by ratww 1989 days ago
> Trying to do A/B/C is annoyingly slow - etc ? [...] We're trying to focus on frustrating pages/experiences rather than number of network calls

It's not really a problem with a certain page or a certain action: it's a systemic issue, that can only be solved with a systemic change.

This has come up before here in HN [1]. From my point of view, ignoring the issue around number of calls/performance and all feedback regarding it is the root cause for the slowness.

[1] https://news.ycombinator.com/item?id=24818907

1 comments

Hi ratww,

Thank you for reiterating this point, and I'll try to shed some light on this. We actually are working on systemic changes to try to make this lighter/better but I can't talk about specifics until the feature is available.

On the other hand, any level of specificity is great, for example: 1) full page loads are slower and more annoying than Transitions (or, vice versa) 2) loading Home page is slower and more annoying than Search Results (or, vice versa) 3) waiting for the editor to load is more annoying than X/Y/Z 4) etc....

Even systemic changes require individual work for applying to these different views, so any level of specific feedback would be helpful.

(also it looks like HN is limiting my reply rate so apologies for any slowness)

Dear Esteemed Colleague at Atlassian,

I also use Confluence and JIRA regularly, and can confirm that they are the slowest most terrible software that I use on a regular basis. Every single page load and transition is slow and terrible.

Asking "which one is the highest priority" is like asking which body part I'd least prefer you amputate. The answer is: please don't amputate any of them.

It's as if I asked you to dig out a hole for pouring foundation for a house. The answer to "which shovelfull of dirt has the highest priority" is all of them. Just start shoveling. It's not done until you've dug the entire hole.

It's like the exterminator asking which specific cockroach is bothering me the most. (It's Andy. Andy the cockroach is the most annoying one, so please deal with her first).

What I, and many many other commenters, are trying to tell you is that the entire product is slow and terrible (not your fault. I'm guessing you're new and just trying to improve things, and I hope you succeed!). If it were a building, I'd call it a teardown. If it were a car, I'd call it totaled.

It doesn't matter what page or interaction you start with. Just start shoveling.

Hi lostdog,

Thanks for the understanding! Indeed I haven't been at Atlassian that long, but that's not a good excuse: it's my problem to own.

I appreciate the reinforcement of "fix everything", and I assure you we're trying our best to do so. As a PM it is my natural instinct (and literal job) to prioritize, so I'm always looking for more details to do so.

I can understand that my request for details can imply that I'm either not listening or not believing the feedback, but that is not the case -> I do understand everything is slow and needs fixing.

This is a throwaway since I use Jira/Confluence at work and am not authorized to officially speak on their behalf.

We are actively looking for other solutions outside of Atlassian, specifically because the demands to switch to your cloud offerings. We simply do not trust your cloud.

We also have a higher compliance requirement, since we can have potential snippets of production data. Our Jira/Confluence systems are highly isolated inside a high compliance solution. We can verify and prove that these machines do not leak.

The Atlassian cloud is completely unacceptable in every way possible. And going from $1200ish year to $20000 per year with data center is laughably horrendous - for the same exact features.

Unless Atlassian changes its direction, your software is that of the walking dead. We have a absolute hard timelimit of 2024, but in reality, 2022. We'd like to still use it and pay you appropriately, but we're not about to compromise our data security handling procedures so you can funnel more people into a cloud service... And judging by the comments here, is pretty damn terrible.

Same, government contractor can't use cloud Confluence, and the performance is so much worse, why would you? On-prem is so snappy it's comparable to using Word. I evaluated cloud for my previous company in 2018 and performance was the dealbreaker.
If you want to make performance a feature, you need to (in order!)

* define a metric

* measure it automatically with every commit

* define a success threshold

* make changes to get yourself under the threshold

* prohibit further changes which bring you above the threshold

Just do it like that for pretty much every view in the system.

As recommended by another poster to take advantage of the technical community here, I have one question and one comment if you can provide more insights:

question a) My understanding is that performance numbers fluctuate a LOT, even at sampling in the tens of thousands. Do you have any recommendations of tools or methods to reduce this variance?

comment b) we're definitely trying to do this but we're not there yet - most of our metrics don't meet goals we set. Instead the blocking goals must be 'don't make it any worse', which is doable -> but it doesn't necessarily make anything better yet (thus all the questions about what is most annoying that we can fix first).

Hopefully point (b) is clear - I'm not saying "our performance is great/good/acceptable", just the best I can do (as a PM) is try to figure out what to prioritize to fix.

The high variance is another problem. Good software has low variance in performance. Especially if you're sampling in the tens of thousands.

The high variance does give you two tactical problems. First, how do you keep performance from getting worse? Typically you would set a threshold on the metrics, and prevent checking in code that breaks the threshold. With high variance you clearly cannot do this. Instead, make the barrier soft. If the performance tests break the threshold, then you need to get signoff from a manager or senior engineer. This way, you can continue to make coding progress while adding just enough friction that people are careful about making performance worse.

The second problem of high variance is showing that you're making progress. However, for you, this isn't a real problem. You're not talking about cutting 500 microseconds off a 16 millisecond frame render. You need to cut 5-25 second page loads down by a factor of 10 at least. There must be dozens of dead obvious problems taking up seconds of run time. Is Confluence's performance so atrocious that you couldn't statistically measure cutting the page load time in half?

"High variance as a consequence of poor software" is an interesting point and not one I'd considered -> I will take this to engineering and see if we can do anything about that (some components maybe, but we see high network variances too which seem unlikely to be fixable).

Showing that we're making progress isn't as much of a problem - similar to what you stated, the fixes themselves target large enough value that it's measurable at volume for sure, and even in testing.

The main issue is "degradations" -> catching any check-ins that can degrade performance. These are usually small individually (lets say, low double digit MS) within the variance noise), but add up over time, and by the time the degradation is really measurable, its complicated tracking down the root cause. Hopefully I described that in a way that makes sense?

Any suggestions welcome.

(Edit: downvoted too much and replies are throttled again) ----@lostdog Thanks for the detail! will definitely take this to eng team for process discussion.

I work in an area where high variance is very expected and unavoidable. Here's what we do:

In your PR, you link to the tool showing the performance diff of your PR. The tool shows the absolute and relative differences of performance from the base version of code. It also tracks the variance of each metric over time, so it can kind of guess which metrics have degraded, though this doesn't work consistently. The tool tries to highlight the likely degraded metrics so the engineer can better understand what went wrong.

If the metrics are better, great! Merge it! If they are worse, the key is to discuss them (quickly in Slack), and decide if they are just from the variance, a necessary performance degradation, or a problem in the code. Typically it's straightforward: the decreased metrics either are unrelated to the change or they are worth looking into.

The key here is not to make the system too rigid. Good code changes cannot be slowed down. Performance issues need to be caught. The approvers need to be fast, and to mostly trust the engineers to care enough to notice and fix the issues themselves.

We also check the performance diffs weekly to catch hidden regressions.

IF YOUR ORGANIZATION DOES NOT VALUE AND REWARD PERFORMANCE IMPROVEMENTS, NONE OF THIS WILL WORK. Your engineers will see the real incentive system, and resist performance improvements. Personally, I don't believe that Atlassian cares at all about performance, otherwise it never would have gotten this bad. Engineers love making things faster, and if they've stopped optimizing performance it's usually because the company discourages it.

It’s absolutely baffling for one of the leading tooling providers in the software development cycle they do not have internal competency to...track if their tool actually works or performs, and beyond that they didn’t proactively reach out to customers to at least get verbal direct knowledge while the telemetry was stood up.

Instead they “prioritized” a legal change in the ToS.