Hacker News new | ask | show | jobs
by anileated 1226 days ago
It is not breaking the ad-based model—it’s breaking open information sharing culture as we know it.

Yesterday: 1) You do research, you publish a book, you write some posts. 2) People discover your work and you personally, they visit your posts and subscribe to you. 3) You have an opportunity to upsell your book and make money on ads to sustain your future work; more importantly, you get to see traffic stats and see what is in demand, you get thank-you emails and feel valued.

Tomorrow: 1) you do research, write posts, publish a book, 2) it is all consumed by a for-profit operated LLM. 3) People ask LLM to get answers, and have no reason or even opportunity to buy your book or know you exist.

What exactly are the incentives to publish information openly in that world?

(Will they even believe you if you say you’re the one who did the niche research powering some specific ChatGPT answer, in a world everyone knows that you can just ask an LLM?)

3 comments

Why would someone only ask an LLM questions when they were in the market to buy a book? Most people I know don't buy books in order to look up the answer to a question, sure some people buy reference books and use them but that's not really what we think of when talking about authors and books. If I'm in the market for a book, I'm looking to read a book, not query something or someone for answers. I think your example should go like this:

Tomorrow: 1) you do research, write posts, publish a book, 2) it is all consumed by a for-profit operated LLM. 3) People ask LLM to get answers to some related question or interest 4) They ask the LLM for a list of recent books that go in depth on the topic or are in the genre etc. 5) Your name comes up in the list 6) Goto step 2 from Yesterday

> 4) They ask the LLM for a list of recent books that go in depth on the topic or are in the genre etc. 5) Your name comes up in the list

My belief is that ChatGPT is actually not quite capable of that, after seeing examples of how it manufactures non-existing references. Besides, if it were capable of that, why would it not show your name as part of the answer already now?

The cynic in me thinks it’s not capable of that primarily because it is not a priority for OpenAI and training data strips attribution, with an explicit purpose: if the public knows that ChatGPT can trace back the source, OpenAI would be on the hook for paying all the countless non-consensual content providers on which work it makes money.

We should treat OpenAI as we treat Google and Microsoft. It has great talent and charismatic people working for it, but ultimately it’s a for-profit tech company and the name they chose ought to make us all the more suspicious (akin to Google’s “don’t be evil”).

> Why would someone only ask an LLM questions when they were in the market to buy a book?

Why would you be in a market for a book when you can learn the same and more by asking an LLM that already consumed said book? And therefore why would the author spend effort writing and publishing a book knowing it’d sell exactly one copy (to LLM operator)?

It's very much in their interest, if the information their models provide is impossible to verify then it severely limits its uses. You essentially can't use it as a source for anything that requires any type of citation or reliability. That's a huge handicap for selling it to businesses and researchers. The general problem of determining what training data was used to produce an output is an open problem in ML and one that is being very actively worked on since it would greatly further the field.

You believe correctly that ChatGPT is not capable of showing sources, it's currently impossible to do but we were discussing Tomorrow so I included it as a possibility. You could potentially hack it in now using traditional search or nearest neighbours but it wouldn't be 100% accurate, probably not even 50%, it would just show a bag of similar texts so not really worth doing.

I'd still be in the market for a book even if we had a perfect LLM that could answer every question I had with impeccable accuracy. I read books because I want to find out about things I don't know that I don't know. It's pretty hard to find those things if you just do question response. It's like a graph, if you start at one node it may take you a very long time to traverse the graph to another node but if you have some outside source that gives you the address of a new node you can just jump straight to it.

Exactly. As a professional artist, I am expected to have a public online portfolio and publicly available imagery of shows and exhibits. Saying that I'm forfeiting my stake in my art because I'm showing it publicly is a really great way to kill art and culture. AI is not learning to make, draw, use mediums in a skilled manner. AI is scraping my public images and plottlining them with the input of humans to label them, tag them and apply stylistic qualities to them.Just because there are massive amounts of data to dilute influence doesn't change that the computer is still simply doing what a human is telling it to do with imagery created by humans. If you took away the human input, labeling and tagging you will find that the computer has not learned anything. I can look at 'AI' art and pick out artists from the collated imagery. Unlike 'AI'I can't spit out the imagery by photocopying/plottlining/tracing it. I have to learn the skills of each artist involved to recreate what I see. Motor skills require practice and effort. 'AI' is not learning motor skills, which is the basis of the creation of art. It is mapping and applying statistical algorithms to amalgamate data from preexisting sources for those who want 'Art' without the effort of time or skill to produce it. At this very moment 'AI' art is being used to sell merchandise with zero credit or monies going to the people who used their human motor skills to create the backbone of this art. Sadly,this only agravates the ways copyright already restricts human art.Imagine if we lived in a world where people valued artists with respect for thier craft? I once had someone ask me how long it took me to draw a charcoal drawing. The short answer is half an hour. The long answer is that I was doing daily scketching practice and investing many hours a week doing charcoal excercises. I am currently out of practice with charcoal and as it is a medium with no erasing or margin of error, I doubt I could recreate my drawing myself without 'getting my hand back in'. It is obvious to me that this 'AI' tool is being used by humans, with the industry of humans, to exploit humans for the gratification of end user humans. I suppose humans could stop making art to feed the monster...
That’s my main fear. Not the fairness / unfairness but that people might be less willing to share info and a lot becomes inaccessible / secret.
I am also anxious about the web becoming fragmented and secretive. If one must gain access to the right circles to start learning, it hinders learning in general, and for myself and many people I know would basically mean we wouldn’t be doing what we’re doing if it were the case when we were younger.
Exactly. We’re in ChatGPT honeymoon but incentives to share info moving forward unclear. Could see big model owners paying for exclusive access to content / data hindering the free distribution of information and becoming like the old publishers and gatekeepers.