Hacker News new | ask | show | jobs
by rngname22 936 days ago
You can't stop people from polluting the web with ai-generated outputs (and therefore contaminating data sets you hope to be able to be able to assume are human-generated) until you create a humanweb (fuck the 'web3' attempts we've had so far, web3 ought to be human-verified vs non-human-verified web) that has real, effective human-verification on inputs built-in. The regular web will still be useful but for an increasing number of applications you'll need to go to the humanweb to get what you need where self-feeding hallucinations and sloppy modelpaste isn't everywhere.

If people are mad that Twitter gives a megaphone to everyone, including the ignorant masses, then they'll love the auto-spam that LLMs are going to create.

Anything that can be said will be said.

You want a Reddit with humans? Ha. On the regular non-human-verified discussion platforms of tomorrow, you'll be lucky if 4% of the comments you are replying to and arguing with even have a human on the other end, but the good news is the rebuttal comment you posted after having too much coffee will be ingested and used for training of the next version of the model you're arguing with. So your original content human-input may be parroted much more broadly than it would've been on the pre-LLM web.

If LLM spam really does flourish and spread misinfo and hallucinations everywhere and we don't develop good automated means to prevent it or to verify content, it may be necessary for a central authority/business to maintain hardware terminals at distributed, centralized locations for interacting with the humanweb that you can't install or control the software on and where a human or a camera is watching you physically type on the keyboard to make sure you aren't just automating the inputs physically with some software->machine->keyboard interface or connecting some virtual keyboard. Think a locked-down public library computer but you're watched while you interact with it, and they're deployed and administered across the planet by a trusted multinational for sensitive usages where you absolutely need to ensure the inputs are from humans.

You wanna get real fun and cyberpunk novel thought-experimenty, picture prison-like security, physical pat-downs or even a requirement that you use the terminal naked and are body-searched for devices. Maybe x-ray scanned for implanted hardware.

Of course the whole thing falls apart if the trusted authority that administers the hardware is compromised but at least you stop some of the non-state actors and script kiddies.

6 comments

Ah yess, human verified web, let me just send my government ID in to get internet priviliges. No wait, I could still be running a bot. How about DNA samples? Body parts? Biometrics, like I have to keep my eyes in the eye scanner or finger on the fingerprint pad to keep my connection on? Nahh I'll just hire some people to stay in the machkne watching movies while I operate a swarm of bots off of their connection...
I have been thinking about this exact question (how to verify that a user is a human) and I still don't have a good answer for it.

At least not in a non-dystopian way.

Sam Altmann's WorldCoin tries to achieve this using retina scanners which I believe falls in the "dystopian" camp.

I think we'll eventually come to the conclusion that it's the wrong question.

What we really want is certain types of content, and to ban others. If we get that certain type from a bot, that's fine; if the type of content we don't want is coming from a human, it should still be removed.

By "type" of content, I mean very broadly. For instance one could create a community in which there's a limited number of posts/characters/etc. per day, not just be looking at the characteristics of the content itself. I mean all aspects of the content, data, metadata, all of it, as part of the analysis of "desirable."

If you want a pure-human community, put constraints on the community only humans can meet; heavy-duty, unscalable identity verification may play a role there.

As a bit of a "how do you build communities online" hobbyist, I think another trend we're going to see is communities getting faster on the draw to evict participants (originally wrote "people" here, but it's actually generically "participants"), for reasons beyond mere spam or active antagonism. Historically, I think it's a thing that most communities have done; the American/Western zeitgeist has disfavored that idea for a while in favor of expecting every community to take everyone who wants to join, but regardless of the ethics or philosophy behind that idea, I think that's just going to become simply impossible online. If the standard for participation in some community includes bots that won't be evicted no matter what they do, that community will rapidly become just another bot congregation ground and look like all the rest of them. With people roaming the internet for new communities to infiltrate with their bots, community building will become a subtractive process rather than an additive one. That's going to be a big change, it isn't going to be smooth or all good.

> If you want a pure-human community, put constraints on the community only humans can meet; heavy-duty, unscalable identity verification may play a role there.

I predict that this requirement would only decrease the amount of community and further increase the already high levels of isolation and alienation in society.

But I also predict that conversational AI will inevitably do this anyway, so perhaps we're just doomed.

Bootstrapping will be a big problem. A community that already has some size can potentially start adding an identity-checking step, but if you want to start a new community with confidence that you don't have it full of unaligned bots, it's going to be a lot harder.

Once the community gets going, though, well, we have experience with that. The web used to have a lot of actual communities, where you might know someone for 10 years and perhaps meet up for picnics or something. Larger sites took a huge chunk out of them, and there's actually some disadvantage to the Internet being completely geography-agnostic... it's hard to meet up with my community of 50 people spread more-or-less evenly across the world, or even the US. But they have existed before and they may exist again. I said it won't be all good in my original post, but it won't be all bad either. Some of what is going to be excluded in the botpocalypse is the worst of what exists today. Of course, there's going to be all kinds of incentives to create new pathologies, so who knows which way it will go in the end.

I’m not 100% sure what problem we’re trying to solve. If it is having authentic discussions with real humans… I don’t think there’s any alternative to just meeting with them in real life. Maybe we can exchange hand-written letters.

If the goal is to use the internet to produce interesting discussions and arguments, IMO it would be neat to try embracing the fact that bots are going to exist and get in the dataset. If bots produce outputs, and we pick the “good” output, that output can be smarter than the model, and go back to train the model, right?

Altman's shitcoin won't solve a thing. The "real human" user could just be acting as a front for a spambot.
Indeed, if it's a one-time only, account creation or long duration authentication system, spam bots reusing said account afterwards would be an issue.

I guess that "always on" verification or short duration authentication could make this strategy less useful.

People go to where the desirable content is, and some "humanweb" with a high barrier of entry inevitably has a chicken and egg problem, where it's not worth to go there until the thing you need is there, and so people who might create that thing won't go there and will create it elsewhere.

All the best non-commercial content will be created somewhere where creators don't need to rely on "hardware terminals at distributed, centralized locations for interacting with the humanweb that you can't install or control the software on and where a human or a camera is watching you physically type on the keyboard to make sure you aren't just automating the inputs physically with some software->machine->keyboard interface or connecting some virtual keyboard.", while on the other hand, commercial content farms will have no problem hiring a thousand minimum-wage employees to spend 8+ hours in those locations creating authentic, verified human-entered astroturfing spam.

I don't know what it means to be human-verified in any useful sense.
Maybe Internet cafes will become more of a thing again. The manager will verify you as a real human using their computers, and the Internet cafe itself gets audited.

Or imagine a Costco Metaverse Verification Center. You can play in a VR metaverse with other verified humans at other Costcos around the world. AR cameras on the headset will ensure you can see your $1.50 hotdog and soda combo so you never have to leave the metaverse. Costco would also provide you a sleep pod at cost if you want to plug back into the matrix right after waking up.

Check this out: https://worldcoin.org/world-id, another project from Sam Altman:
It means meatware was closer to the end of the process of inputting data into the system.
I think I know what "human-generated" means, but I don't know what "human-verified" means.
How would you verify that? Any digital means is out, and nothing else will be able to scale.
It seems incredibly dishonest to not understand why people want to interact with other people.
I understand the desire to restrict your consumption to human generated content. But I don't understand what is meant by "human verified".
It seems incredibly dishonest to not understand why governments want a cryptographic backdoor that only they can use.
It's funny, the whole concept of human-verified vs non-human-verified web I've heard raised before and it sounds a lot like the Blackwall in Cyberpunk:

https://cyberpunk.fandom.com/wiki/Blackwall

Like others, I don't see how we can have a "human-verified" web without bringing in a whole lot of nastiness as a side-effect.

But assuming we could, wouldn't the "human-verified" web just function as a data source to further train LLMs (or whatever)?