|
|
|
|
|
by civilian
1575 days ago
|
|
I was hoping this tool also solved a problem that comes from saving & reproducing JS-framework-heavy websites. Here's the bug: According the HTML spec, elements like <h2> and <div> cannot be inside <a> tags. But using js you _can_ push <div>s instead of <a>s. (It happens from document.insert-type functions, frameworks like Angular/React allow this) Look at nasa.gov, there's html: <a href="/press-release/nasa-invites-media-to-next-spacex-commercial-crew-space-station-launch-0" date="Wed Mar 02 2022 10:35:00 GMT-0800 (Pacific Standard Time)" id="ember196" class="card ubernode cards--card cards--2row cards--2col nodeid-477815 ember-view"><div class="bg-card-canvas" style="background-image: url(/sites/default/files/styles/2x2_cardfeed/public/thumbnails/image/51846702013_a0cc55100a_k.jpeg);">
<!----> <h2 class="headline"> ...
</h2>
</div>
</a>
After running this through SingleFile you can visually see the changes, but the html changes are: <a href="/press-release/nasa-invites-media-to-next-spacex-commercial-crew-space-station-launch-0" date="Wed Mar 02 2022 10:35:00 GMT-0800 (Pacific Standard Time)" id="ember196" class="card ubernode cards--card cards--2row cards--2col nodeid-477815 ember-view"></a>
<div class="bg-card-canvas" style="background-image: url(/sites/default/files/styles/2x2_cardfeed/public/thumbnails/image/51846702013_a0cc55100a_k.jpeg);">
<h2 class="headline"> ...</h2>
The way that sites like Wayback Machine handle this is by using the web-replay library Wombat https://github.com/webrecorder/wombat that also uses JS to insert those elements.But what the hell! I was working on a similar html-downloading/reproducing tool and this bug really bothers me. I'd either like the HTML reading standard to be updated to accept <div> inside of <a>, or also make that impossible to do via JS. |
|