Hacker News new | ask | show | jobs
by pharmakom 1180 days ago
OpenAI is actively blocking the scraping use case. Does this work around that?
6 comments

Couldn't find any mention of this, please provide a source. Their ToS mentions scraping but it pertains to scraping their frontend instead of using their API, which they don't want you to do.

Also - this library requests the HTML by itself [0] and ships it as a prompt but with preset system messages as the instruction [1].

[0] - https://github.com/jamesturk/scrapeghost/blob/main/src/scrap...

[1] - https://github.com/jamesturk/scrapeghost/blob/main/src/scrap...

I don't think this is correct at all. It's one of the main use cases for GPT-4 – so long as the scraped data or outputs from their LLMs aren't used to train competing LLMs.
What do you mean by this, and what would be their reason for doing so? I've tested a few prompts for scraping and there have been no problems.
Ran into issues asking for JSON output
What kind of issues?
> OpenAI is actively blocking the scraping use case.

How? And since when? Scraping is identical to retrieval except in terms of what you do with the data after you have it, and to differentiate them when you are using the API, OpenAI would need to analyze the code calling the API, which doesn’t seem likely.

Workaround: use another tool to scrape the markdown then hand the text to OpenAI
OpenAI - scrapes the whole World Wide Web. When I ask for a script to scrape a website, you might be breaking our ToS lol.