Hacker News new | ask | show | jobs
by refulgentis 537 days ago
> I wonder if other people use the same heuristics as me when judging a random arxiv link.

My prior after the header was the same as yours. The fight and interesting part is in the work past the initial reaction.

i.e. if I react with my first order, least effort, reaction, your comment leaves the reader with a brief, shocked, laugh at you seemingly doing performance art. A seemingly bland assessment and overly broad question...only to conclude with "Has anyone else read the paper? Do you like it?"

But that's not what you meant. You're geniunely curious if its a long tail, inappropriate, reaction to have that initial assessment based on pattern matching. And you didn't mean "did anyone else read it", you meant "Humbly, I'm admitting I'm skimmed, but I wasn't blown away for reasons X, Y, and Z. What do you all think? :)"

The paper is superb and one of the best I recall reading in recent memory.

It's a much whiter box than Spare Autoencoders. Handwaving what a bag of floats might do in general is much less interesting or helpful than being able to statistically quantify the behavior of the systems we're building.

The author is a PhD candidate at the Carnegie Mellon School of Business, and I was quite taken with their ability to hop across fields to get a rather simple and important way to systematically and statistically review the systems we're building.

1 comments

This paper is doing exactly that though, handwaving with a couple of floats. The paper is just a collection of observations about what their implementation of shapley value analysis gives for a few variations of a prompt.
You have an excellent point. Bear with me.

I realized when writing this up that saying SAE isn't helpful but this is comes across as perhaps devils advocating. But I came across this in a stream of consciousness while writing, so I had to take a step back and think through it before editing it out.

Here is that thinking:

If I had a model completely mapped using SAE, at most, I can say "we believe altering this neuron will make it 'think' about the golden gate bridge more when it talks" ---- that's really cool for mutating behavior, don't get me wrong, it's what my mind is drawn to as an engineer.

However, as a developer of LLMs, through writing the comment, I realized SAE isn't helpful for qualifying my outputs.

For context's sake, I've been laboring on a LLM client for a year with a doctor cofounder. I'm picking these examples because it feels natural, not to make them sound fancy or important

Anyways, let's say he texts me one day with "I noticed something weird...every time I say 'the patient presents with these symptoms:' it writes more accurate analyses"

With this technique, I can quantify that observation. I can pull 20 USMLE questions and see how it changes under the two prompts.

With SAE, I don't really have anything at all.

There's a trivial interpretation of that: ex. professionals are using paid LLMs, and we can't get SAE maps.

But there's a stronger interpetation too: if I waved a magic wand and my cofounder was running Llama-7-2000B on their phone, and I had a complete SAE map of the model, I still wouldn't be able to make any particular statement at all about the system under test, other than "that phrase seems to activate these neurons" -- which would sound useless / off-topic / engineer masturbatory to my cofounder.

But to my engineering mind, SAE is more appealing because it reveals how it works fundamentally. However, I am overlooking that it still doesn't say how it works, just a unquantifiable correlation between words in a prompt and what floats get used. To my users, the output is how it works.