Hacker News new | ask | show | jobs
by lurk2 476 days ago
> This is why this dataset is hard to wrap your head around: there's just sooo much here. It would take a ridiculous amount of time to try to manually read through it all. Also, at a glance at least, it appears that the bulk of it is idle chatter and conspiracy nonsense, presumably with evidence of crimes sprinkled in here or there.

Not exactly hard-hitting journalism. He then goes on to speculate that Scot Seddon's disavowal of the January 6th protests was disingenuous, and that his true feelings would be revealed in chat logs after Trump was re-elected. But:

> This is much more readable – but still, I don't think I can bring myself to sit down and read 77 pages of these messages right now. And that's just this one export of this one Telegram channel.

So the guy complaining about conspiracy theories goes on to invent his own despite having access to potentially corroborative data that he simply can't be bothered to read.

5 comments

The guy is just walking us through the process of analyzing the dataset. He’s not really making any conclusions at this point - it’s like a technical tutorial for journalists.
> Not exactly hard-hitting journalism.

But nonetheless fascinating. There are must be some really good PhD thesii written (to be written?) about how someone is supposed to handle this sort of data dump with modern tooling. It is a non-trivial general problem; we have a lot of really data floating around in public (Panama papers, relatively transparent government info, dumps of less transparent info at wikileaks.org, OSINT of all shapes and sizes). Even if a body reads the whole thing they need some sort of solid mental schema going in or they'll end up in crank territory.

Although why he thinks old mate would change his position on the Jan 6 riots is a mystery (and why he cares). Taking a stand against riots is one of those easy-win political options that costs nothing and almost everyone agrees with. Riots are fundamentally ineffective; I doubt anyone serious wants to be associated with rioters. I suppose stranger things happen.

How about a whole book?

> It's come to my attention that this dataset is rather challenging for journalists and researchers to wrap their heads around. I wrote a book, Hacks, Leaks, and Revelations, aimed at teaching journalists and researchers how to analyze datasets just like this.

I didn't even catch that on my first reading.
> Taking a stand against riots is one of those easy-win political options that costs nothing and almost everyone agrees with. Riots are fundamentally ineffective; I doubt anyone serious wants to be associated with rioters. I suppose stranger things happen.

In full fairness "riots" is what its called when the rioters lose. If they win they are usually called something more positive and celebrated by the resulting new regime.

There is a solid tradition of new regimes killing off the rioters because they are unruly troublemakers. Not a guarantee, but certainly a tendency. Nobody likes rioters when you get down to brass tacks.
> Riots are fundamentally ineffective; I doubt anyone serious wants to be associated with rioters. I suppose stranger things happen.

bullshit. the only reason you have an 8 hour work day and a semblance of worker protections is because a lot of people fought and died for them.

it's the only reason 8 year olds don't go down into the mines, or lose hands working in factories.

Jan 6th made a serious run at congressional officials; the VP of the US basically had to hide or get lynched. this could have been a thing, but didn't go all the way.

> > This is much more readable – but still, I don't think I can bring myself to sit down and read 77 pages of these messages right now. And that's just this one export of this one Telegram channel.

77 pages isn't that much in the scheme of thing. A court case having 77 pages of evidence would be entirely normal.

And let's be honest 77 pages of telegram chats would probably take 15 or 20 minutes to read. It's not exactly Proust.
Not that it's a great method but just for fun I gave a large chunk of it to an LLM to process and then asked it for the 20 most disturbing or nefarious things in the chats and it was incredibly boring. Most interesting thing I learned from the files is how many gun toting americans also drive dodge chargers.
I would have expected nice pickup trucks or any TRD Pro trim Toyota.

I'd be curious if the LLM's own self-censorship would prevent it from reporting truly disturbing things. Maybe add one legitimately bad thing into the middle of a chat and see if it gets reported.

I'm fairly decent a prompt engineering I think, told it was for my art project of a creative writing class and I'd hidden 20 disturbing and nefarious things in the text (made sure to inject a fake murder into the text) - fake murder then a bunch of airsoft stuff, some psychological manipulation, and it oddly surfaced...some fb cookies, heh. 2mill tokens x3 runs
Have you tried querying for specific misconducts and let the LLM focus at one at a time? E.g. Find whether murders were planned or carried out, can you find any signs or plans of bomb-making, can you list all messages related to fire and arson, were any mass manipulation campaigns planned, etc, ...

I have the feeling that would probably be more effective, not sure though.

Depends. Most of Telegram is indeed shallow. But some of my groups are occupied by people with a competition of who can write the longest and convulted essays of deep philosophical and political issues.
> So, I figured I'd write a series of posts publicly exploring this dataset and sharing my findings.

> ...

> At the end, I'll have a single database of Telegram messages from the whole dataset. I'll be able to query it to, for example, show me all messages from Scot Seddon sorted chronologically. This will make it simple to see what he was saying in the lead-up to January 6, immediately after January 6, and then what he's saying about Trump these days, after he was re-elected.

There are more parts to come in this series, which is very clearly stated in the post.

If I claim to have evidence that you committed a crime, and announce that I will post the evidence later, should my claims be taken seriously, or dismissed?

Even if he's right (and I'm not saying he isn't), this kind of behavior is inexcusable (though completely expected) coming from a guy who calls himself a journalist.

The author of the blog post, Micah Lee, appears to be one of the directors of Distributed Denial of Secrets (DDoSecrets)[0].

DDoSecrets appears to be an anarchist/communist affiliated activist group.

Basically you've got two groups from extreme sides of the political spectrum fighting each other, the Guy Fawkes LARPers upset about Jan 6 of all things, and the seal team 6 LARPers upset about "stolen" elections and ivermectin.

[0]: https://ddosecrets.com/about