| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by acqq 3552 days ago
	So "somebody" was me at the end, I've rolled up my sleeves and extracted the damn pictures: in my 1.7 MB session file approximately 1 MB were the base64 encodings of exactly 57 jpeg pictures that were all the result of the Google image search (probably of the single search). There were additionally a few pngs, representing "facebook, twitter, gplus" images and one "question mark sign" gif too.

1 comments

alphapapa 3551 days ago

Thank you for doing that. This explains much.

For a while now I have been running a cronjob to commit my profile's sessionstore-backups directory to a git repo every 5 minutes.

This is because, occasionally, when Firefox starts, it will load tabs in some kind of limbo state where the page is blank and the title is displayed in the tab--but if I refresh the page, it immediately goes to about:blank, and the original tab is completely lost.

When this happens, I can dig the tabs out of the recovery.js or previous.js files with a Python script that dumps the tabs out of the JSON, but if Firefox overwrites those files first, I can't. So the git repo lets me recover them no matter what.

What I have noticed is that the git repo grows huge very quickly. Right now my recovery.js file is 2.6 MB (which seems ridiculous for storing a list of URLs and title strings), but the git repo is 4.3 GB. If I run "git gc --aggressive" and wait a long time, it compresses down very well, to a few hundred MB.

But it's simply absurd how much data is being stored, and storing useless stuff like images in data URIs explains a lot.

link

acqq 3550 days ago

If I understood the intention of the programmers, they simply want to store "everything that can imitate to the server the continuation of the current session" even after the new start of the Firefox (like the restart never happened). The images were sent by Google, but probably remain in the DOM tree which is then written as the "session data" or something like that.

Like you, I also observed that exactly the people who depend on the tabs to "remain" after the restart are those who are hit by the bugs in the "restoration" and as I've said, I believe the users would more prefer to have "stable" tabs and URLs than the "fully restored sessions in the tabs" when all the tabs fully randomly (for them) disappear. Maybe saving just the tabs and URLs separately from "everything session" would be a good step to the robustness (since it would be much less data and much less chance to get corrupted) and then maybe, pretty please, an option "don't save session data" can be in the about:config too)?

Once there's decision to store just the URLs of the tabs as the separate file, the file can even be organized in a way that just the URL that is changed gets rewritten, therefore making the "complete corruption" of the file impossible and also removing the necessity for Firefox to keep N older versions of the big files (which then eventually still don't help the user like you).

link

alphapapa 3550 days ago

> Maybe saving just the tabs and URLs separately from "everything session" would be a good step to the robustness (since it would be much less data and much less chance to get corrupted)

Yes, that would be very, very useful. I can get by if the tab's scroll position and favicon and DOM-embedded images--and even formdata--are lost. But if the tab itself is lost, and it was a page I needed to do something with, I may never even realize it is gone...

link