Hacker News new | ask | show | jobs
by alach11 720 days ago
There's no doubt that LLMs massively expand the ability of agencies like the NSA to perform large-scale surveillance at a higher quality. I wonder if Anthropic (or other LLM providers) ever push back or restrict these kinds of use cases? Or is that too risky for them?
5 comments

That ship has probably sailed. If Llama3 is performing on par with GPT-3.5, then there is no real benefit for companies to restrict access to slightly better proprietary models.
GPT-4 is “holy shit, this actually works, could be better but it’s so good I almost can’t believe it” while GPT-3.5 is “when it works it’s pretty great, just a pity it almost never does”.

So I would assume that three letter agencies would love to take something like GPT-4 and fine tune it based on all the data they have about existing terrorists.

I'm still dealing with hallucinations nearly every time I use it.
I get maybe one hallucination per twenty chats with gpt4.
I haven't tried more than a handful of queries, but I think I've gotten 100% rate of hallucination or generic useless response to specific question.
Can I try your question? Just curious.
Do you mean 3.5? While I still face issues with GPT 4, I can't even remember the last time it hallucinated. I'm not saying it can't. But, yeah, that's crazy that they're specifically targeting your IP address like that.
NSA should be training their own GPT-4 or better model as we speak and should have been doing it for a long while now. Anything else is borderline incompetence.
NSA can't hire the right talent capable of producing that product for the same reason they have trouble finding white-hat security people to hire: You can't work for the government and do drugs in your personal time. Enough of the pie of elite researchers are in to wacky mind-bending that it's a real recruitment problem.
Also, imagine the public shitstorm when people see headlines that NSA has overturned their policy against microdosing. They're not gonna understand what tf is going on and trying to explain it away sure af isn't gonna happen because they'll always believe that all drugs are bad and defending zero tolerance policies are the hallmark of being one of "the good guys".
You don't think compensation is the bigger issue?
And given the volume of data they likely sift through, I'd also expect them to want very small, high-throughput models for identifying targets for larger models to examine.

On the flip side, LLMs must give the NSA a new challenge: a flood of garbage text generated by no-one in particular. Perhaps there will be more effort to put surveillance directly on-device as tapping networks yields more noise.

I’d expect they’re using huge models to train many small ones, one for each threat actor. Those small models could decide whether their actor is detected, or it’s time to slot in a different one.
On the grasping side, they are probably in the best position to train a GPT-5, given the amount and type of data they're presumed to have.
Will it really though? So far I've seen most of the "revolutionise" claims to be mainly hot air and marketing.

It's possible that LLMs will suddenly make a leap in reliability and usability (e.g. much higher context window without corresponding massive increases in memory usage). But I have yet to see it.

So far it's great at some specific usecases. Interacting with humans, rewriting or making up text. Summarising. A hit & miss at everything else.

Don't get me wrong, I love AI tech and I'm heavily experimenting with it (both at work and at home with local models). But as with most hyped technologies I find the benefits far overblown in marketing stories.

Our leadership jumped on Microsoft Copilot (the one for Office 365 because they have tens of different copilots :) ) like a pack of hungry wolves afraid to miss the boat. And the result was.... kinda meh. It's kinda promising and impresses with simple play school stuff ("make me a presentation about home safety") and totally and utterly fails when you try to do anything serious work related. Sooo many times I get "Sorry I can't do this right now", "Sorry I need more training for this", "I can't do this for you but this is how you can do it yourself!" or it does something but like totally wrong.

Meanwhile we have a bunch of MS training people running around evangelising and telling us how great everything is and making excuses for everything that goes wrong :) You can almost see them breathe a sigh of relief every time something works as it should. That's not what we were promised.

Maybe it will get there, but I don't see it happening tomorrow to be honest. LLMs were an impressive leap but their achilles heels have become clear and it's proving difficult to overcome them.

I'm really enjoying surfing the knife's edge of technology (as I was and still am with metaverse) but I don't yet see this as a game changer except in a few specific industries. People editing text for a living certainly have a need to worry.

I also wonder what will happen with future AI training. Now that more and more websites are filled with AI-generated content that is often at best "mediocre", and considering future AI models will be trained on that, will they be able to improve their accuracy or struggle to maintain it?

I use LLMs extensively in my field to automate all sorts of tasks. Need to classify a million PDF documents for cheap? Write a prompt and submit a batch job. Need to read 30,000 drilling reports to automatically scan for hazards? Done in 60 minutes.

These are tasks that would have taken months of development or millions of dollars in manual effort before. It's not just hype.

Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous.

What’s the deal here with liability and accountability? That’s a serious problem when considering using these for anything other than toy problems.

You don't actually think the LLM is reviewing those 30k documents do you? You tell it to write a program (which is easy to audit) to pull the info from the PDFs or whatever. I don't get why this crowd is so goddamn unimaginative with LLMs.
> You tell it to write a program (which is easy to audit) to pull the info from the PDFs

Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.

[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...

Why are you expecting they are all very different? They're all likely very similar.
Because I've heard of enough lazy uses of LLMs to be suspicious. Auditing the program means being sure that the info pulled from those documents is reviewed properly. Also, a complete lack of regard for other people's privacy.
No idea where privacy enters in here.
>Boy, I can’t wait for the foundation of my house to disappear because the LLM mis-classified a drilling report as non-hazardous

LMAO! It's so hilarious that people like you forget that the alternative is relying on bureaucracies managed by people that get things wrong more often and are both too lazy and too stubborn to process your application to review your drilling report again.

If using both human-level and AI-level analysis is cheaper and much more accurate (but still imperfect), I'm willing to settle for a better system than oppose all change and die holding out for a perfect system.

What are 'people like me'? It's not like I know nothing about large language models, I just think using them for civil engineering is a bad idea...
One thing I've struggled with while applying LLMs to business problems is how others have dealt with identifying and managing system failures.

Let's say some of your drilling reports contain a pattern that indicates balrog activity, which the LLM misses. The legal or insurance context requires you to monitor and address potential balrog activity. How do you plan for these failures?

In almost every case I've seen, the plan is to not have a plan, which is another way of saying that the data doesn't matter so long as no one complains about the results.

Same way you manage human failures?
The way we manage human failures are with rules, checklists, and accountability. LLMs struggle with all of these, and I get the sense that spending 6mos to develop long lists of rules isn't what the parent comment has in mind with "just write a prompt"
I think that for low-risk classification tasks and similar, something like an LLM is a great tool, and I can absolutely see it being extremely useful for intelligence work where sifting through stuff is very hard. However, I would not at all trust AI to make actually important decisions independently.
A genuine question and not meant as a snipe: as hallucinations are an inherent “feature” of LLMs, how can you be sure of the accuracy of the model’s interpretation of those 30,000 drilling report hazards? Or what is the acceptable level of risk?
You have it write a program to analyze it. I think a lot of people fail to understand that you don't always need the LLM to do the thing, have it write a program to do the thing for you.
That's not very likely to succeed, is it? LLMs can do a lot of things, but writing software that not only parses semi-proprietary file formats but also analyze unstructured data sounds more than little bit far fetched. I'd be impressed if just the first, and by far the easiest, part of that can be accomplished.
It's extremely likely to succeed because there is a documented format. I can't believe how pessimistic this site is about this stuff. Yeah, you're not going to one shot it with a prompt. If that's your expectation, you're confused.
Okay, but you still need to debug the program. If your program must give correct results you still need to check the program output against every case. There's no free lunch there.
Speaking generally: The program doesn't always have to give correct results. The program just needs to reduce 30k documents down to 200 documents for human review.

You're comparing LLMs to a hypothetical alternative where a human reviews all 30k documents in detail. But the real alternative is often just a worse quality sieve where more errors blunder their way through the existing flawed processes. LLMs can improve on that.

You're right. That's why to be sure I don't use software. All paper and pencil. So I can be sure. I have no idea what your point is.
How can you be sure with humans doing the work?
That's where the law comes in. You can prosecute a human for negligence. What about an AI?
Would you trust your LLM to file your taxes for you?
Yes because without an LLM I don't do it.
How did you do your taxes a couple years ago?
I never did. Had to pay fines.
I hope with all the time and money being saved, you're having humans check the results.
Yes but that is one of those niche tasks I meant.

Once again they are selling it like something that's for everyone right now. This is the problem. THe same with the metaverse. It has some really great usecases, but they made it out like next year we would all ditch our phones and work exclusively in a VR headset. Obviously that didn't happen, as the tech was nowhere near that and probably people don't want it either.

Also, if you really need to be sure that those 30.000 drilling reports really didn't contain any hazards, you still have to go through it all yourself. Don't forget LLMs aren't reproducible.

But no, my point was exactly that it's not just hype. There are genuine useful usecases, I totally agree.

As there were for metaverse, and probably even for blockchain (NFT not so sure tho :) I always thought they were really a solution looking for a problem). The key thing about a hype is that they overblow the potential benefits way too much though. I see this happening here once again.

They're pretty clear about being pro safety to the extreme, and mass surveillance to protect american interests and abuse of LLM tech (e.g. open source misuses) are probably within the umbrella of ends justifying the means logic anthropic employs.
When you see the kinds of things that are developed in the name of "defense" it's easy to see how AI "safety" could become a similar sort of doublespeak.
AI safety already is double speak. The primary meaning is "safety" for investors who don't want to be associated with something distasteful. The other meaning is basically a thin cover.
Well you can look forward to worse, give it another decade and Lockheed Martin will be extolling their commitment to AI safety while announcing their new generation of fully autonomous kill drones. For defense, of course.
dumb question. I can understand LLM can be used for disinformation as it can generate text/image at scale. can you explain how it can do large scale surveillance?
LLMs can be fed a conversation and understand the intent of its participants, even if no particular keywords are used. Before this, surveillance was limited by how many human agents you could have sifting through recorded data.

Put another way: most people only get charged with a crime if it's worth a law-enforcement officer's time to catch you, but many small violations are ignored in favor of higher priorities. We may have to contemplate a future where AI is clever enough to notice everything that can be construed as a violation of some law and put on a prosecutor's backlog.

Schneier talks about this as well: https://www.schneier.com/blog/archives/2023/12/ai-and-mass-s...

I wouldn't say that they can be used to do large-scale surveillance, but they can definitely facilitate it, especially with CV integration. I think one can easily imagine the following scenario: you fill a LLM with photos from people (taken from a public camera for instance), it finds the closest matches (via a web search for instance, as Gemini does). From then, you can easily gather the most essential information: first and last name, age, usernames... And then use this information to structure even more precise prompts and find even more potentially interesting data: posts on forums, relatives... And with this data, you can create an exhaustive database with a plethora of information and data about these people.

That's what any good stalker or person experienced with social engineering is able to do right now, but it takes a lot of time and energy. Resorting to LLMs would considerably decrease both. And it gets easier the more people you have information about.

Specifically, vision transformers (ViT) outperforming established CNN.