Hacker News new | ask | show | jobs
by UpToTheSky 1080 days ago

    For example, if a user specifically asks for a
    URL's full text, it might inadvertently fulfill
    this request.
So this seems to imply two things:

1: Bing has access to text on websites which users don't. Probably because websites allow Bing to crawl their content but show a paywall to users?

2: The plugin has a different interface to Bing than what Bing offers via the web. Because on the web, you can't tell Bing to show the full text of the URL.

I have to contact my ISP. That's not the open web I subscribed to :) Until they fix it, I just keep reading HN. A website which works the way I like it.

6 comments

There are various techniques automated agents (eg crawlers like Google's) can use. Ethical ones are done in agreement or following the guidance of the content providers to allow discovery which suits both parties while not giving unrestricted access which wouldn't always suit the provider.

We could hypothesize that in this case BWB is employing some of those techniques while it isn't a discovery-enabling service, but rather a content-using one, and so would be expected to present as an ordinary user and be subject to the same constraints.

Geofenced sites, cookie forced sites. GDPR dodge bypassess ...

Nothing you couldn't do with a decent VPN, but 'Open'AI these days already achieved what they wanted from publicly demonstrating GPT, and are now more focussed on compliance with regulation and reducing functionallity to the point of minimally staying ahead of the competition in released product, while fullsteaming ahead with developing more powerfull and unrestricted AI for internal exploitation with very select partners.

In such a scenario, the true power of AI is the delta between what you can exploit vs what you competition has access to. HFT would be a nice analogy.

They seem to be implying that it worked like a natural language version of archive.today.

They (the AI companies collectively) keep creating powerful tools and then taking them away.

If you give people tools, people will use them in ways you won't be able to control.

Option 1 is definitely true, but I don't think paywalls are the issue. Bing has a "work search" option, to index and search sharepoint sites. My bet is there's a leak between public and private search.
Maybe some sites allow search engines to bypass paywalls so the full content gets indexed, and the plugin appears to be a whitelisted search engine to these sites?
A lot of sites just implement the paywall client-side with some JavaScript and CSS, so any kind of search indexer would still see the full text regardless of the user agent or source IP.
Paywalls.