| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adebayoj 116 days ago
	op here, I mostly agree with your comment! However, our model does more than this. For any chunk the model generates, it can answer: which concept, in the model's representations, was responsible for that token(s). In fact, we can answer the question: what training data caused the model to be generated too! We force this to be a constraint as part of the architecture and the loss function for our you train the model. In fact, you can get are the high level reasons for a model's answer on complex problems.

2 comments

codeflo 116 days ago

All of the examples on the linked page seem to be "good" outputs. Attribution sounds most useful to me in cases where an LLM produces the typical kind of garbage response: wrong information in the training data, hallucinations, sycophancy, over-eagerly pattern matching to unasked but similar, well-known questions. Can you give an example of a bad output, and show what the attribution tells us?

link

adebayoj 116 days ago

You got it exactly right. Guilty as charged. Over the coming weeks, we will be showcasing exactly how you can debug all of these examples.

I agree that attribution is most useful for debugging and auditing. This is a prime usecase for us. We have a post with exciting results lined up to do this. Should be out in a week, we wanted to even just get the initial model out :)

link

Grimblewald 116 days ago

What I am reading here is that when the model is wrong, it still (at least sometimes) confidently attributes the answer to some knwoledge base, is that correct? If that is the case, how is this different to simply predicting the vibe of a given corpus and assinging provenance to it? Much less impressive imo and something most models can do without explicit training. All precision no recall as it were.

link

gchamonlive 116 days ago

I think this was answered before, with the constraints of the architecture of the model. You can't expect something fundamentally different from an LLM, because that's how they work. It's different from other models because they were not designed for this. Maybe you were expecting more, but that's not OP's fault or demerit.

link

Grimblewald 116 days ago

What you're saying fits my understanding/expectations. However the post and the user I am replying to seem to imply different. This makes me wonder, is my understanding incomplete or is this post marketing hype dressed up as insight? So I am asking for transparency.

link

adebayoj 116 days ago

It is not hype. You can try the model on huggingface yourself to see its capabilities. My reply here was clarifying that the examples we showed were ones where the model didn't make a mistake. This is intentional, because over the next few weeks, we will show how the concepts, and attribution we enable can allow you to fix this mistakes more easily. All the claims in the post are supported by evidence, no marketing here.

link

gchamonlive 116 days ago

We are probably at the point where hype and insight aren't that much distinguishable other than what would bear fruit in the future, but I agree with you

link

IshKebab 115 days ago

> what training data

The demo just says "Wikipedia" or "ArXiV". That's pretty broad and maybe not that useful. Can it get more specific than that, like the actual pages?

link