| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bstrama 36 days ago
	Can't wait to see the next generation of LLMs after feeding it all of that hahaha

1 comments

everyos_ 36 days ago

The page requires JS to load its content - user agents without JS support just get a blank page.

I'm not sure if the bots that scrape data to train LLMs are capable of loading that type of page, or if they only work on pages that have the content inside the HTML itself?

link

replygirl 36 days ago

any serious scraping service these days will fail over to a headless browser when it fetches an asset referencing a js bundle that isn't verifiably a vendor script

link

aDyslecticCrow 36 days ago

Not using JavaScript would also make the crawler fail on squarespace and wix website builders.

The age where the web was usable at all without JavaScript is long gone. No scraper would get much scraping done without JavaScript these days.

link

cachius 36 days ago

You mean by embedding? How can an external site fail on squarespace and wix website builders?

link

tardedmeme 35 days ago

A crawler would fail on all Squarespace and Wix sites if they all require JavaScript.

link

cachius 35 days ago

Found it https://halupedia.com/javascript-requirement-on-squarespace-...

link

bstrama 36 days ago

I'm aware and will implement SSR soon ;)

link

m3047 36 days ago

It's entirely possible they simply ingest the JS as-is.

link