Hacker News new | ask | show | jobs
by generationP 1134 days ago
TL;DR anyone? The filing takes an ungodly amount of time to load and the server might croak under HN pressure.
4 comments

Without irony, here's ChatGPT's summary:

This legal complaint alleges that defendants operating a non-profit entity for the benefit of humanity have committed massive fraud on donors, beneficiaries, and the public. The complaint raises concerns about OpenAI's operation, including its dual structure as a non-profit and a for-profit entity, potential insider dealings, and the exclusion of the general public from its benefits. It claims that OpenAI has used deceptive advertising, unfair competition, and fraud to develop its valuable resource for personal gain.

The complaint highlights OpenAI's mission of benefiting humanity and points out that a narrow group of stakeholders have received commercially invaluable early access to its technology. It also argues that OpenAI's for-profit operations might infringe on copyright and fair use laws, as the technology is built on large datasets, much of which is copyrighted. It accuses OpenAI of breaching trust and fiduciary duties, disrupting legal frameworks, and potentially engaging in willful and wanton negligence by increasing existential risks related to AI.

Finally, the complaint alleges that OpenAI might have engaged in banned political activities, specifically suggesting that the technology may have been used to influence the 2020 US presidential election in favor of the Democratic party.

Wow, that's the most useful ChatGPT summary I've ever come across.

I've never found it particularly useful for most articles which are easy enough to read/skim (the first and last 2 paragraphs will usually tell you what you need), but long complicated legal documents are a whole other matter. This is great.

Works really well for legislation, too. GPT4 can pick up on deltas between bills using markup like <strike> if kept in the document, summarizing changes to the bill as it moves through the legislative process.

The only challenge is chunking the larger bills and synthesizing the larger summary without losing out on possible nuances. Something like California's SB423, for example, is over twice the 8K token limit and that's not even a large bill.

Unfortunately, things like the US Code or Code of Federal Regulations are in the range of 100s of millions of tokens.

Assuming it’s accurate.
ChatGPT picks up 80% of the meaning and rewrites it in beautiful prose. Or maybe another language, in the style of Shakespeare.

On the other hand, if you're in a field where there's an adversarial use of text and the uncomprehended 20% might be used to nullify, contradict or make loopholes in the main body, then relying on ChatGPT is similar to using Tesla Full Self-Driving in a construction zone, near firetrucks, during a snowstorm.

Has ChatGPT been caught hallucinating on summarization tasks?

My impression was that hallucination happened when it simply didn't have facts in the first place, had conflicting facts, etc.

I thought summarization was generally fairly reliable, but I'd be happy to know if this is not the case.

Every summmarization is a choice of salience: what to include and what to leave ou, and how to express something in a different way.

The failure foolishly and misleadingly called “hallucination” is only one manifestation of an attribution error. If your summarizer leaves out something very important because it doesn’t understand it the result will be quite misleading.

For your average web text which these days is 90% filler and not important anyway, this is no big deal. This particular lawsuit appears the same. But for anything important, I wouldn’t trust it.

In my experience it’s generally accurate when summarizing content provided in the prompt context. Where it can run into trouble is “recalling” (if you can call it that) content that it was trained on.
Its accurate. (I read the whole filing.)
How would ChatGPT summarize something that just happened? It's not real time?
You can copy/paste part of the summary of the complaint. 4k tokens is a lot to work with.
You just copy/paste the text of the complaint (this is why you hear many complaining about maximum context size... for some use cases you want to feed in a lot of text)
Multiple claims, all related to 'fraud'. Claims that OpenAI is committing fraud because they are a for-profit disguised as a non-profit, for example.
There's some pretty kooky claims and generally bizarre statements buried in the filing, such as:

* "Altman and the other parties to this suit are increasing the risk of global human extinction or actual world domination by a small set of individuals for a chance to personally gain extended lifespans. It is the reasonable explanation for taking such massive risks with this technology and flaunting the law so obviously."

* "OpenAI and at least one of its partners most likely filled social media like Twitter, Facebook, and Reddit with politically charged commentary designed to push votes towards the Democrat party." (And no, the filing doesn't provide any substantial evidence for this assertion.)

* "Y Combinator is, according to ChatGPT, the most notable Tech Accelerator in the world. A screenshot of ChatGPT-3.5 stating this is included as Exhibit L."

* "Open-source and closed-source, not-for-profit and for-profit, are binary choices, or Booleans. Booleans are a form of data with only two possible values, which are typically opposites. When defendants drastically changed the Boolean values that structure [OpenAI]... the founding mission went from ‘true’ to ‘false.’"

I was going to give you a hard time for the pejorative "kooky", but it turns out you were being generous and charitable.
> There's some pretty kooky claims and generally bizarre statements buried in the filing,

Sure, but the kooky claims are mostly tangential to the identified causes of action except the fourth, and they aren’t the sole basis for that one, so they have marginal bearing on the overall suit.

Would these people even have standing for that claim?
I compressed it and uploaded here: https://filebin.net/jznrmizmhtrsdwen

6.4MB

How did you compress it?

Gzipping the 52 MB PDF just shrinks it to 51 MB. You got it to 6.4 and kept it PDF.

GhostScript!

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

I'd upload the PDF of the complaint, but it's 55 MB for some reason which will kill the bandwidth of any free-sharing service.
> it's 55 MB for some reason

Because it has 280 pages of newspaper articles and white papers about OpenAI, ChatGPT, and other startups attached to it as exhibits, many of which are only tangentially related to the case.

Even if it's mostly rich text, it's still text.

PDF embedding is funny.

I was looking for a place to upload it as well.

Edit: I would mirror it on one of my sites, but all my sites are pay-per-gb for bandwidth.