Hacker News new | ask | show | jobs
by spookylukey 3475 days ago
It's a bug: https://bugzilla.mozilla.org/show_bug.cgi?id=307089

Or, it's a memory saving feature. To implement "View source from cache" requires keeping around the raw page HTML, which you might not otherwise need after parsing - except you probably will for all the developer tools to work, so this probably should just be considered bug.

8 comments

I wouldn't mind having to activate it in the developer tools and disable when I don't need it anymore and can feel the difference. Definitely easier than having to open a separate program like fiddler.
How much memory is saved here? http://www.httparchive.org/interesting.php?a=All&l=Dec%202%2... 52 _kilobytes_ ? Really?
The browser would have to save the source text for EVERY tab it opens, regardless of whether you view source or not. That seems more onerous than making a separate request for the source code every once in a while.
Am I in some parallel universe here ?

We are talking about hundreds of KBs of memory total which is nothing and if it's a worry for mobile devices we have the disk. It's not a feature that demands instant performance.

> It's not a feature that demands instant performance.

Hence it's acceptable to request the document from the server when opening 'View Source'.

No, because when you want to view the source you mean to view the source of this very document, not the source you'd get by asking to reload the page, it is often different.
If you want to see the page as it's currently being rendered, looking in the developer tools seems more relevant anyway.
No it's not, and it's particularly inappropriate for any document being viewed as a result of a POST request.
But who does this feature help exactly? A _tiny_ number of users. Chrome is currently using over 3GB on my machine. Anything they can do to trim that without affecting performance is worth it to me.
3GB is how much memory the _rendered/executed_ source code takes. The source code is nothing. Store it LZ4 compressed in memory and it's less than a rounding error.
Yes, I know that, but all of these little inefficiencies add up, and for what? A tiny portion of users who need the developer tools? Not worth it.

This kind of thinking is exactly what leads to our software being as slow or slower than that of two decades ago while running on machines hundreds or thousands of times more powerful.

Yes, but if you have more than enough memory, then that's no problem.

So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

EDIT: Anyway, storing it in the browser's file cache would also do the trick, I suppose :)

Isn't that one of the issues nowadays, that people have the attitude "we have enough memory" and thus don't tend to care about optimizing for memory usage anymore.

Your suggested solution, about discarding when memory is needed, solves this problem. But it solves a problem that would be created by needlessly storing a lot more data than needed. So it's a solution to a problem that was not a problem to begin with.

But I know, if it's a few kb, it will not make a huge difference.

No this is really not the case of thinking we have enough memory. This is the realization that the static source code of a page is a percentage error compared to the memory overhead of a rendered page. The modern browser are memory hog I would rather they focus on that.

Its also a broken implementation function wise since I have asked for the source of the current page not of a reload.

100kb * 100 tabs is 10MB. I'll take my chances.
So hide it behind a feature flag?
> "we have enough memory"

Your phone is in disagreement.

We're talking about a default feature which is disabled 'for average users'. Average users only have a few tabs open in their browser. Not 100. Average users also don't have 3 GB RAM, nor do they use view source on a mobile phone, and neither do average users actually use the feature. So the default setting makes sense even though a user who is using the feature may end up using several MB due to a reload (not cool on a plan).

If you are saving 57 kB per tab open, that'd be ~5,7 MB with 100 tabs open. But if you have 100 tabs open on a mobile phone (!!), you have a bigger problem, and all those tabs are causing swapping already anyway. In that sense, enabling the feature by default makes sense. And don't forget that some people don't have flat rate internet.

My phone has a quite insane 3 GB of RAM.
Similar "tab discarding" feature already exists in Chrome, by the way:

https://developers.google.com/web/updates/2015/09/tab-discar...

> So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

It is more tricky then you would think to determine what "needed" memory means. Does the OS need disk cache? Or the content of memory mapped files?

It's not unlikely that you currently have some process active which has memory mapped a huge file. Does it need the content? Who knows.

You are talking about an OS-provided cache for the browser to store it's own cache in? There's little sense in that.
Moving memory management into the OS often makes sense, because the OS has the big cross-application picture, knows the system-wide memory pressure, and most importantly, already manages the memory of applications, by swapping between RAM and disk.

For the same reason, OS X and Android Linux both have systems for OS managed caches, and AFAIK Firefox already uses these: https://bugzilla.mozilla.org/show_bug.cgi?id=748598

The status on mainline Linux is a bit more nebulous (seems Android's ashmem has been upstreamed, but it's not directly usable on GNU/Linux systems?), and other efforts have stranded: https://lwn.net/Articles/602650/

For some more thoughts about memory management on OS level vs. application level, I can recommend this "random outburst" from the designer of the Varnish HTTP cache: https://www.varnish-cache.org/docs/trunk/phk/notes.html

But the amount of memory required for plain text source is very small compared to the memory allocated to render the page.
What everyone is missing here is that one in ten thousand users needs this, so why optimize for the vast minority?
> requires keeping around the raw page HTML

Isn't it already in the browser's cache in that state anyway?

And if the cache is small, the preservation doesn't even have to be "for all tabs," if the last few pages can be retrieved from the cache nobody would complain that the older ones behave as they behave now -- you typically don't do "I wonder what was the source of the page from yesterday" in this old tab, but even if you do, you wouldn't be surprised that the source of yesterday does have to be requested again. So I imagine the fix as just "if in the cache, get it from there, else request."

> Isn't it already in the browser's cache in that state anyway?

Like disk cache as in Cache-Control? In most cases you wouldn't cache the HTML itself, but in cases where you do then your use case should already work as stated, since for the browser to do otherwise would imply the cache is being intentionally ignored for the view-source request.

> In most cases you wouldn't cache the HTML itself

Why not, at least as long as that's the topmost tab? Wouldn't then the view-source-new-request problem be solved by just using the existing features?

With devtools closed seems not worthwhile then. Vast majority of users aren't using devtools
Why keeping the original source in memory? Why not saving it to disk?
Maybe because it would prematurely age SSDs?
like anybody gives a damn about that. Spotify for sure doesn't.
Why would you need the original HTML for dev tools to work? Maybe there are ones I've never seen but the ones I use are using the DOM rather than the original HTML string.
You don't typically.

One case where you would View Source in addition to using Dev Tools is debugging how the browser is massaging your source HTML into the DOM, for example by inserting missing <tbody> elements. Validating your HTML mostly addresses this (I'm actually not sure about the <tbody> example, I would hope the validator at least issues a warning), but isn't something you necessarily want or can do with, for example, user-uploaded HTML snippets.

True, though I guess using some HTML validation could improve that. Not sure if <tbody> is mandatory but I think I've seen some tool complain about its absence.
I showed a few people who have no idea about programming whatsoever how to hack websites to download .mp4 or .mp3 from source code of the website. How would you show them that using DOM...
How would you show them that using the HTML source?

You can use dev tools to view the network tab to find whatever URL Pandora is using.

You can use dev tools to find a media elements src attribute.

To see the JavaScript sources... You check the dev tools.

If anything, in the days of JavaScript, the HTML source will missing a few things.

If there's an audio tag, that element exists in the DOM and has a reference to the url where you can download it. Is this a trick question?
Yes, my mother doesn't know what DOM is or why there are so many buttons in devtools. ctrl+u -> ctrl+f -> mp3 is much easier to explain and repeat.
Right but then you're talking about viewing the HTML source right? In other words "View Source." I'm asking why dev tools would need access to the original source and not the DOM. It doesn't make sense to save the original source in memory to me once it's been parsed.
ctrl+shift+i, ctrl+f "mp4"
Wow first raised in 2005. Nice!
Couldn't you reconstruct the source from the internal dom representation anyway?
No. The parser corrects and changes the source to make the DOM.

So it is impossible to find errors usually. So things like:

    <p><div> </p></div>
the parser will correct, and therefore the DOM will be correct. But what it does to fix this may break your site. I often found the CSS would be screwed up for many reasons (rules don't match DOM structure -- due to bugs all over our codebase)
Also, changes made by client-side JS.
oh right.