| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thaumasiotes 346 days ago
	See, you don't understand. Fixing the bug before reproducing it would violate the process.

2 comments

jraph 346 days ago

Well, I do think the thundebird team should investigate and fix this. But it is almost impossible to fix a bug you can't reproduce and have no clue why it might be happening.

link

thaumasiotes 346 days ago

> But it is almost impossible to fix a bug you can't reproduce and have no clue why it might be happening.

No, not at all. It's very easy.

This bug involves taking an inappropriate action under corrupted conditions. You don't need to know how those conditions arose. All you have to do is check whether they currently obtain, and - if so - refrain from taking the inappropriate action.

For this bug, that looks like this:

1. When we're executing a "move"...

2. Before deleting the original messages...

3. Check whether the copies are identical to the originals...

4. And if not, delete the copies instead of the originals.

At this point, the bug can't occur. The "root cause" bug, where your buggy logic says that you copied a bunch of messages even though you didn't, can still occur, but it can no longer delete any messages.

link

nativeit 346 days ago

So…do it. Sounds like it’d make a great case study that would get a person tons of attention and praise on HN, a real feather to put in one’s cap.

Literally nothing stopping anyone in this thread from opening a PR with this reportedly “very easy” fix that’s eluded developers for nearly two decades, and is so terrible folks swear off Thunderbird forever because I guess for email very basic rules for backing up data don’t apply (or something?) and/or Gmail and Outlook are implicitly trustworthy?

link

thaumasiotes 346 days ago

> and is so terrible folks swear off Thunderbird forever because I guess for email very basic rules for backing up data don’t apply (or something?)

Well, this bug literally causes Thunderbird to delete your original copies of data during the backup process, so I'm not sure why backing up your data is supposed to be the solution.

link

jazzyjackson 345 days ago

Thunderbird stores mail locally on disk.

If you're keeping backups of your disk, then this bug is not unrecoverable.

link

1718627440 345 days ago

But keep in mind, that this is a cache. An UIDVALIDITY change will wipe everything out.

link

mixmastamyk 346 days ago

Did a developer ever try? Reading the issue, found only one person asking for test cases and trying to close it.

link

thaumasiotes 346 days ago

One of the many comments on the issue notes that although the bug has reoccurred in every version of Windows, it might not get much attention from developers because it is catalogued as something specific to Windows XP.

Nobody in the intervening nine years followed up by updating the bug's metadata, though. It's still "Windows XP only".

link

Forgeties79 346 days ago

I try to never underestimate the incompetence/lack of concern people can have when it comes to addressing major product issues, but if this has been open for 17 years and is so widely known, somebody has surely looked into it and determined it’s not so easy.

link

redeeman 346 days ago

and then they simultaneously determined "yeah, we might eat your data. Lets not warn anyone about that AT ALL, lets keep the feature activated and let them users lose their data". This behavior ought to be criminal.

link

godelski 346 days ago

  > Lets not warn anyone about that AT ALL, lets keep the feature activated and let them users lose their data

How did you conclude this?

IDK why the assumption is that safety measures haven't been created. You wouldn't mark the bug as resolved if you put in safety features, right? You *ONLY MARK AS RESOLVED* after reproducing the bug and *VERIFYING* that it won't happen again. Right? Dear god I hope this is what you do, because otherwise you are prematurely closing bugs.

link

autoexec 346 days ago

I'd agree with you that the fact that the bug is still open after 17 years isn't the problem, but the issue is that people are still (as of 10 months ago) running into the issue of their mail being deleted. If they'd secretly implemented "safety measures" as you suggest that wouldn't be happening.

Looking at the timeline, it's possible that they've addressed a few of the bugs that result in data loss several years ago, and it's possible that the latest guy who ran into the problem within the last year triggered it in new ways or under new conditions but it's clear that the problem of thunderbird deleting messages from the server when copies haven't successfully been saved during a move operation wasn't solved by any "safety measures" 9 months ago and it's doubtful that it's been solved now.

My guess is that because thunderbird ultimately doesn't bother to make sure that messages are successfully and accurately copied before it removes them from the server it'll only be a matter of time before someone else stumbles on some other set of circumstances which results in data loss when messages are being moved.

link

redeeman 346 days ago

are you being for real? did you see anything as such in the bug listing? but even IF they did put safeguards in place, the fact that this is SEVENTEEN YEARS, no warning, functionality still enabled without ANY WARNING losing people data. unforgivable.

How can you possibly justify this behavior? I understand they dont owe the world any software, fine, but dont knowingly publish stuff that KILLS PEOPLES DATA without atleast a warning

link

Forgeties79 346 days ago

Make no mistake - I am not absolving them of leaving this issue unaddressed lol just saying if it was easy they’d likely have handled it. It’s probably difficult or they just don’t know, so they keep putting it off and decided that not enough users are affected for real consequences (which is wrong to do)

link

redeeman 346 days ago

i fully understand it might be very hard to fix, but to know about it for that long, and not warn people or disable the functionality is unforgivable

link

yifanl 346 days ago

It's not criminal, but you're entitled to a full refund of Thunderbird in the event it happens.

link

CamperBob2 346 days ago

This line of reasoning will eventually cause it to be treated as criminal, or at least as a civil tort. Then we will all be worse off.

link

account42 345 days ago

Neither lack of payment nor the liability disclaimers in the license absolve the developers of liability for malicious actions or gross negligence.

link

saurik 346 days ago

Just because something is free certainly does not make it ethical, and it doesn't even mean it should be legal.

link

redeeman 346 days ago

an option I will be making use of now.

And I know its not criminal, im saying it SHOULD be criminal not to warn people about this. its more than a decade

link

brookst 346 days ago

if (!user_requested_mass_delete && delete_requests_past_second > 10) throw(“we sure seem to be deleting a lot of stuff from the server”)

link

jraph 346 days ago

I would not want my email client to be relying on such brittle and incorrect heuristics.

A better workaround would be to keep deleted emails around for some time so users have the option to restore them if the bug triggers. But this has drawbacks such as potential privacy breakage (you meant to delete mails you don't want the chance that anybody sees it) or free disk space management (your local drive is overloaded and you want to urgently free up space) or ux confusion (this is a de facto trash but Thunderbird already has such a feature)

Ultimately, what needs to be done is make the code robust, make sure there are no race conditions, etc.

link

brookst 346 days ago

Well, would you rather have a brittle heuristic lose all of your mail?

My point wasn’t that this is a great solution, just that it is very easy and almost certainly better than doing nothing for almost two decades.

link

jraph 345 days ago

> Well, would you rather have a brittle heuristic lose all of your mail?

That's not what's happening. I wouldn't expect such an heuristic to be currently present. There is a bug, not something intentional.

> almost certainly better than doing nothing

No, because with such an heuristic, you add behavior that's difficult for the user to understand well and to work with. With such an heuristic, you will lose some mails and at some point the process stops in the middle. Which mails have you lost? What is "many" mails? 10? 100? What if my computer is fast and is deleting 100s of mails per seconds, losing all the mails anyway? What if it is slow and never triggers the heuristic?

If the heuristic does trigger, you end up with a mixed situation where you still have lost some stuff, but not all, and it'll be impossible to understand which ones. It doesn't fix the issue (you still lose email), just makes it even more difficult to understand even for the devs when they inevitable need to track down related issues. You really don't want to willingly add mechanisms that feel like they are non-deterministic: they are hard to debug, and hard for the users to grasp.

A way better solution is backups anyway: if you care not to lose your emails, you should be backing them up. From the beginning, your local TB mails are not a proper backup of your IMAP account because it's two-way synchronized so you need a backup somewhere else.

A still better workaround is disabling the move to local folder feature and make people copy and then manually delete mails.

Not saying your heuristic is not a good idea or clever (it is clever and could lead to further good ideas), just that after reflection, it should probably not be implemented. It barely starts to address the issue and adds complexity for everyone involved.

link

jamespo 346 days ago

Just do a complete rewrite in rust, that will solve all the issues

link

goku12 346 days ago

Except the bug was filed in 2008. Back then, Rust was Graydon Hoare's personal project that Mozilla wouldn't start funding until a year later. Rust was written in OCaml and the famed borrow checker wouldn't be in place until 2010. The first public release was v0.1 in 2012 and the first stable release 1.0 wouldn't happen till 2015. The language was very different back then with sigils, garbage collection and green threading as language features. So this bug was already bugging people when Rust was just an embryo that was still years away from birth.

Now even if we neglect the timeline, Rust only guarantees memory safety. If TB is deleting mails on the server too, then the corruption is happening over IMAP connections as well. Does that sound like a memory safety bug to you? Perhaps it is. But how do we eliminate the possibility of a logical bug that Rust won't protect you against, when nobody has any clue even now? And all that aside, if you're going to rewrite it in Rust, you might as well start a new project in Rust instead of porting an old design that may potentially contain a language-agnostic logical Heisenbug.

I'm not trying to be hostile here. I started using Rust in 2013 (I have 12 years of experience in a 10 year old language, and a bunch of repos that I can't compile anymore unless I compile the compiler from old commits somehow!). I wouldn't use C or C++ for any of these applications - I simply don't have enough competence to avoid the kind of bugs that Rust protects me from (despite being a hardware engineer with more knowledge about memory management than about type system theory). Despite all that, statements like this will only cause an unwanted backlash against Rust. Not that you're entirely wrong, but some people are so offended by such suggestions for reasons that are still under investigation, that they start a crusade against Rust [1].

[1] https://fosstodon.org/@goku12/114077011555069124

link

amendegree 346 days ago

The op is being facetious

link

mattl 346 days ago

Is whichever part of Mozilla that runs Thunderbird going to rehire the rust team now?

link

tomrod 346 days ago

Honestly? It might.

link

dbalatero 346 days ago

Only by being a statistically different code base, unless you think this is a memory safety bug?

link

guerillagorilla 346 days ago

A better approach might be to feed all of this into an LLM to have it figure it out. If it finds a bug and has a fix, reproducing it might be easier and a test could potentially be written.

I don’t think LLMs are the answer to everything, but this would be a good test for newer generations of LLMs as they’re developed.

Worst case- it deletes all of your emails, but that would’ve happen anyway, right? =)

link

account42 345 days ago

Reproducing bugs is a luxury and not even close to required for analyzing and fixing issues. Even if the issue is external (hardware, antivirus, etc.), the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.

link

godelski 345 days ago

You're right, but you're also wrong.

The problem is you can never close the bug report if you can't reproduce. I guess, you could, as the other commenter suggests, mathematically prove that it can't happen, but otherwise you're prematurely closing it.

How do you differentiate that you solved the bug and not a similar looking bug?

  > the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.

But this doesn't solve the problem.

  - What if it is an upstream issue? They have to be connected, since they are deleting data. Maybe it is completely a bug on their end? Doesn't matter how defensive you are if the bug was "anytime an email has 'man man' and is pulled between 00:00-00:04 everything deletes" then what can you do? 
  - What if the user was hacked and the hacker just deleted all the data?
  - What if the user was just dumb and deleted the data themselves. Either not knowingly or were embarrassed to say anything. 
  - What if it is another program on the user's computer that is deleting the data because of some weird unexpected collision?

I'm sure you can think of more situations that still won't solve the problem.

How do you close the report if you can not make strong guarantees that it is resolved?

link

jraph 345 days ago

A luxury? Not even close to required? You are not afraid of words! I'm not looking forward to receive a bug report from you!

Yeah, reproducing is not theoretically mathematically necessary. In theory you could prove your code is correct with formal methods¹. Now, nobody does this because it is impractical (borderline impossible), reproducing is in practice so useful as to be almost essential:

- it lets you study how your code behave in the problematic case and identify what's causing the exact issue the user is seeing

- it lets you check that your fix does indeed address the bug

I have indeed already fixed trivial bugs without reproduction cases from a vague description of a bug because I'm intimately familiar with the code and it immediately rings a bell: the cause is immediately obvious. But that's not the usual case.

> the code can be changed to be more defensive and only ever delete the original when the new data has been successfully written and verified.

What if the code is already designed like this (and I sure do hope it is currently written like that, because that's almost common sense, if not the only sensible way of moving something) but somehow fails for some currently inexplicable reason? It smells race condition to me.

In the case of the discussed bug, users have described a reproduction case that's not 100%. But someone will need to find a 100% reproduction case. Users, or devs. It will not be optional. You can't play a guessing game, attempt to fix the code and hope for the best. You might be able to actually fix the bug, but without much confidence. Best case, you'll be able to find a reproduction case after fixing the bug (that you'll probably use as a functional test), to prove you fixed the bug for this specific case you found. You'll not be 100% sure you addressed the user's case.

A bug can hide another one, so you could find and fix a bug, but the issue is still present in the user's case. You can only be sure with their reproduction case.

But I agree that it is hard to reproduce a race condition.

¹ which in practice applies to code of trivial size (static analysis), or consists in checking a model but not the actual implementation (model checking), or does apply useful checks but is not exhaustive and has false positives / negatives (static analysis), or does apply useful exhaustive checks but only on a limited number of executions (runtime verification, and we do have functional tests that serve a similar purpose in practice - and you'll actually need the reproduction case here so you have the right execution to check), or requires you to write your code in a specific language (stuff like coq) and you cross your fingers that this specific language's implementation is itself correct. In short: not applicable here.

link

godelski 346 days ago

  > it is almost impossible to fix a bug you can't reproduce

It's also impossible to mark a bug report as resolved if you can't reproduce it.

You could have fixed the bug (especially since a lot of TB was rewritten) but if you can't reproduce the bug you wouldn't know it was solved only that people stopped reporting it. This is actually a common occurrence with long standing bugs.

link

bachmeier 346 days ago

You know what else they could do? They could disable a feature that deletes large volumes of email the user doesn't intend to delete.

link

mattl 346 days ago

I don’t remember the last time I deleted an email. I’ve marked things as spam, archived things but not deleted in a long while.

link

tssva 346 days ago

I delete email everyday.

link

Tepix 346 days ago

Sometimes you want to delete a whole bunch of mails, don't you?

link

bachmeier 346 days ago

I've updated my comment for clarity. The bug (which I've never encountered in more than 20 years as a Thunderbird user) is that users move messages to a local email folder, but the messages are deleted from the server without actually downloading them. At a minimum they should disable that operation. The guy that originally reported it worked at Sun and lost hundreds of work messages as a result of this bug. AFAICT the user wouldn't be affected if they did a copy of the messages and then manually deleted them from the server folder after confirming the copy was successful.

link

a0123 346 days ago

How do you fix a bug you can't reproduce?

It's a genuine question because I'm puzzled here.

A very small number of users have this bug (and tbf, it's a really bad bug), and are unable to consistently reproduce it and it seems none of the developers have been able to (the seemingly random nature of the bug occurring is not helping). How is it supposed to be fixed?

link

tobias3 346 days ago

You add more and more diagnostics (e.g. logging) in that area till you manage to track down the bug. Over several years this should be possible. At that point you can either fix the bug directly or do it properly by first reproducing the bug (in a test) and then fixing it.

link

inanutshellus 346 days ago

How do you close a bug you cannot reproduce?

Said another way - If they can't reproduce it, they can't close it.

They may well have fixed it already, but without a way to reproduce it the only prudent behavior is to leave it open and wait for the next diagnostic file to be uploaded.

link

naasking 346 days ago

That's not the only prudent behaviour, as the OP said, the prudent behaviour is to add more diagnostics and guards against the conditions that lead up to the bug.

link

godelski 346 days ago

Okay, let's assume more diagnostics and guards were added.

Now re-answer the above questions with these assumptions.

  - How do you fix a bug you can't reproduce?
  - How do you *close* a bug report when you can't reproduce?

Being generous here, we're assuming there's 17 years worth of diagnostics and safety guards added but through that time the bug still isn't reproducible. Let's try to answer the questions under these assumptions.

link

naasking 346 days ago

If you've added guards and diagnostics, then you close it until someone else files a follow-up, then it can be re-opened. There's no sense keeping it open unless there are ongoing reports of the issue.

link

Nition 345 days ago

The way I've dealt with that in the past is putting into into Review or whatever the equivalent is, make a note ("cannot repro, but attempted potential fix in version XXXX, moving to review, please reopen if anyone reports this again) and then if nobody reports it still happening for x amount of time (e.g. 12 months), close it. Can always reopen it if it gets reported again beyond that.

link

justin66 346 days ago

For starters, put a lot more effort into reproducing it.

link

account42 345 days ago

- You can try harder to reproduce it.

- You can extend logging to gather additional information to reproduce it.

- You can try to reason about the code and figure out possible causes.

- You can attempt to formally verify the correctness of the code.

- You can put guards into the code against unexpected states and actions.

- You can verify the correct result of previous actions before any destructive actions.

- If all fails you can scrap the piece of code in question since it seems to be beyond your ability to maintain.

link

ParetoOptimal 346 days ago

> How do you fix a bug you can't reproduce?

You strangle it from the edges.

link