| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by johnnyanmac 189 days ago

It's not training on books, but it will answer questions about the book you're reading. Doesn't pass the sniff test.

>My device, my content

I don't think you own the kindle store and servers used to train the Ai.

3 comments

terafo 189 days ago

There are LLM's that can process 1 million token context window. Amazon Nova 2 for one, even though it's definitely not the highest quality model. You just put whole book in context and make LLM answer questions about it. And given the fact that domain is pretty limited, you can just store KV cache for most popular books on SSD, eliminating quite a bit of cost.

link

DennisP 189 days ago

You could also fill the context with just the book portion that you've read. That'd be a sure-fire way to fulfill Amazon's "spoiler-free" promise.

link

catgary 189 days ago

Are you implying that an LLM needs to be trained on a specific piece of text to answer questions about it?

link

johnnyanmac 189 days ago

If you want proper answers, yes. If you want to rely on whatever reddit or tiktok says about the book, then I guess at that point you're fine with hallucinations and others doing the thinking for you anyway. Hence the issues brought up in the article.

I wouldn't trust an LLM for anything more than the most basic questions of it didn't actually have text to cite.

link

catgary 189 days ago

Luckily, the LLM has the text to cite, it can be passed in at inference time, which is legally distinct from training on the data.

link

terafo 189 days ago

Having access to the text and being trained on the text are two different things.

link

tshaddox 189 days ago

> It's not training on books, but it will answer questions about the book you're reading. Doesn't pass the sniff test.

What do you mean? Presumably the implication is that it will essentially read the book (or search through it) in order to answer questions about it. An LLM can of course summarize text that's not in its training set.

link

johnnyanmac 189 days ago

"Reads the book" is the issue, yes. It's possible they aren't training. Vit to be frank, we're long past the BOTD where tech companies aren't going to attempt to traon on every little thing fed into their servers.

Happy to be proven wrong, though.

link