GPT-4V(ision) system card [pdf]

Y	Hacker News new \| ask \| show \| jobs

	GPT-4V(ision) system card [pdf] (cdn.openai.com)
	46 points by juunge 999 days ago

5 comments

simonw 999 days ago

Genuine question: why is this only published as a PDF?

OpenAI have the resources to also publish this as HTML. They chose not to.

They're not alone in this - most of the academic and research world, plus the concept of a "whitepaper" seems predicated on the idea of publishing PDFs.

Is this some stupid thing where human beings are expected to attach more prestige to information published in this way?

PDFs are a terrible way of publishing information in 2023:

- they render poorly on mobile devices, where many (most?) people do their reading

- they're hard to copy and paste information out of

- you can't link to headings within them (like HTML fragment links)

- you can't easily run them through translation tools like the one built into Chrome

The benefits of PDF I can see are:

1. Easier to print and get the exact expected output

2. You can save one file offline

3. Easier to author

I'm not arguing to replace PDFs with HTML (though I wouldn't miss them personally) - I'm saying publish documents as both!

Provide an HTML version and a PDF alternative for people who want it.

Am I missing something here? Why does the academic and research world stubbornly stick to such a hostile way of publishing their results?

link

lwneal 999 days ago

I think it's about citation. Traditionally, a pdf is a complete and finished work, analogous to a published journal article or book. It is static content and will not change, unlike HTML which might be "under construction".

This isn't necessarily still true: HTML content can stay up on the web forever and a pdf can change, but people still prefer to cite something that looks like a paper document.

Since a whitepaper is often meant to be cited, it's published as a pdf to take advantage of this preference.

The best approach is to publish a PDF for citation along with a public HTML demo, like https://jonbarron.info/mipnerf360/

link

civilitty 999 days ago

It's also feasible to track changes this way. Download the PDF and compare the md5/crc/sha hash to an older pdf file - if they're the same, then there haven't been any changes.

With web pages, you have to download all the linked files and turn them into a deterministic archive and hope that the Javascript included doesn't pull any dynamic content (which isn't really practical to begin with).

link

simonw 999 days ago

This is a really convincing answer, thank you.

link

solveit 999 days ago

I would guess OpenAI uses LaTeX as their default choice because they want to write equations.

link

behnamoh 999 days ago

I’ve been using Word Equations and never have I needed to use Latex. I think most people are still under the wrong impression that only Latex can handle math expressions. I use my custom shortcuts in Word to speed it up tho.

link

yberreby 999 days ago

Using LaTeX does not necessarily imply rendering to PDF.

https://ar5iv.labs.arxiv.org/

link

behnamoh 999 days ago

On top of the other comments, using PDF also makes it harder to crawl data to train or finetune language models. I absolutely hate the PDF format for the reasons you mentioned. I went to long lengths to find PDF viewers with dark mode and Vim key bindings just to make my PDF experience better.

link

Tijdreiziger 999 days ago

> they render poorly on mobile devices, where many (most?) people do their reading

Acrobat Reader solves this with their ‘liquid mode’. But yeah, it would be nice if there was a FOSS renderer to do the same.

link

simonw 999 days ago

Apparently https://www.zotero.org/ is an open source tool that can render PDFs in that way - I haven't tried it myself yet.

link

tmaly 999 days ago

Looking at this, it gave me this other idea.

I was looking over older State building codes from early 90s for a homeowners association issue.

Most of these older codes are scanned pictures of the text.

It would be interesting if they have some type of OCR extension for ChatGPT where you could upload the image of the pages and it could OCR and work with the text.

This same situation happens with the city council agendas current day. They make these 300 page pdf documents all of scanned images of the text. It is really hard to search them and figure out what is going on.

link

dhalp 999 days ago

There are a few companies who are actively tackling these types of challenges. The space is called "Intelligent Document Processing".

Checkout aihub.instabse.com or docsumo.com

link

hsdropout 999 days ago

In this PDF there is an example of a controversial output in response to an image for job applicants. The "solution" was to decline to answer that category of question. This doesn't feel like a reasonable approach, as it will become a game of whack-a-mole.

This also seems to acknowledge that the model has deep bias-related flaws and instead of treating the causes, they are going after symptoms.

link

stoicbatman 999 days ago

An interesting perspective on the use of PDFs in the academic and research world. What I find striking is how PDFs have remained so prevalent despite the rapid digital transformation in recent years. While the static nature of PDFs lends itself to easy citation, it's time we reconsidered the emphasis on format over functionality.

link

swyx 999 days ago

my notes:

- ramped up to 16k BeMyEyes + 1k developer alpha testers over 6 months

- reduced frequency and severity of hallucinations

- improved OCR and quality of descriptions

- great demand for describing people without affecting privacy/bias - intentionally refusing person identification 98% of the time and lowering accuracy to 0%. also declining a whole lot of problematic queries, per fig 8

- converting known jailbreaks to images to defend against multimodal jailbreaks. ironic how jailbreak collection websites probably made it a lot easier to break the jailbreaks

- interesting descriptions of mitigation process in 2.4.2.

discussion linked https://twitter.com/swyx/status/1706359912283152556

link