| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kianworkk 736 days ago
	The goal of SiteProfile is not to scrape data. It only accesses publicly available web pages, such as the homepage, about page, and pricing page. It does not access non-public content on websites, nor does it offer users the functionality to scrape website data.

4 comments

_heimdall 736 days ago

What does it mean exactly for the service to provide information about a website without scraping it? How could summaries or LLM responses be generated be made without scraping pages?

link

dotancohen 736 days ago

Presumably the same way that Firefox makes an HTTP request to the webserver then formats the page for the human user. This is just formatting that page differently. This is no more a scraper than is Firefox's Reader Mode.

That said, lying about the UA is not cool.

link

Animats 736 days ago

I have something that sends a UA of "Sitetruth.com site rating system". Many sites won't talk to that.

link

_heimdall 736 days ago

I've used a reader mode library that I think as created by Mozilla and handles converting a site to reader mode locally. Does the Firefox browser do it locally, or at least on demand? If so I wouldn't really consider that scraping since they aren't parsing the site and storing data for later use.

link

throwaway211 736 days ago

It does scrape the site in order to summarise it, no?

link

pavel_lishin 736 days ago

Your statement doesn't answer their question.

link

kianworkk 736 days ago

I meant that it is not supported yet. I will add this to the to-do list, and I believe it does not conflict with the goals of SiteProfile. Thank you for your feedback.

link

nhggfu 736 days ago

so you scrape + store + process contact info etc, presumably.

sounds like a privacy nightmare

no doubt this is not GDPR compliant.

no doubt this is not legal in some parts of the world - unless people can opt out and get their data removed.

link

CaptainOfCoit 736 days ago

> no doubt this is not GDPR compliant.

Unless the project is open source, no doubt you cannot know this. If they don't store any of those details anywhere (including not in logs) but just pass it along, GDPR won't apply.

link