You can't unit test for taste | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	You can't unit test for taste (dev.karltryggvason.com)
	67 points by kalli 1 day ago

9 comments

trjordan 55 minutes ago

You can't unit test for taste if you haven't written down what you mean by taste. If you can externalize it, then you can.

Follow this line of thinking, and the AI-friendly answer is easy: we just have to externalize everything we know, so Claude can implement what I want.

Except that I can't fully externalize myself. Debugging a system takes more resources than running the system. If I could write down everything I know and hand it to a machine, I'd do that, but it impossible.

People aren't books or hashmaps. If you want to build something, you need to use the tools, not teach the tools to use you.

delichon 10 minutes ago

You may be able to effectively externalize taste by "hot or not" style pair testing. Enough comparisons and I'd expect ML to be able to mimic human taste by latching on to features we're not well aware of influencing us.

trjordan 4 minutes ago

This is RL, right? Like, this is exactly why models have mostly converged around obvious style, because we train them literally on thumbs-up/thumbs-down data of what good behavior and good code looks like.

And that's why it's so hard to get a model to reproduce the specific taste of a person or an organization. My taste is different than yours, so if we dump our aggregate preferences into RL, in averages out to nothing interesting.

For the code-writing case, this means you end up reviewing every line of code, looking for places where you'd thumbs-down the code. Not every line of code contains a real decision, though, so it feels like a waste of time.

bonzini 51 minutes ago

It can't be written down as code, that's the point.

I am more familiar with taste in coding and it can at best be described—that the resulting code is too subtly different from something else in the codebase, that you're masking a different bug, that you're not following what the code tells you. The good part is that while this cannot be unit tested, you can write documentation and code comments about it that tell people what they need to know.

But for taste of the kind described in the article there's not even a definition. The logic ended up being "trust a bunch of opaque weights the most"

Chris2048 26 minutes ago

Technically, AI is code, just very complex code.

I'd say there are "simple" simple things you can do though, like take automated screenshots and detect colours for jarring colourschemes.

sigbottle 7 minutes ago

Exactly. Every single philosophical statement in history runs up against the issue where you can just say, "yeah, it's pretty much this. You just need to do <arbitrarily hard unspecified thing that is basically unfalsifiability>". (Including this one)

And maybe that's just our limits with philosophy, modeling, assumptions, whatever. The danger is not realizing when we're in that zone.

(Fwiw I think unfalsifiability is a limit with any system - "you didn't compile in my syntax/semantics" is an gotcha that's actually valid and useful, but nobody can really determine the hard line)

timroman 7 minutes ago

https://pureinference.com/insights/taste-is-the-new-skill

I wrote about this a few months back. Rick Rubin is famous for this. I do think it is something that can be trained though, it just needs a lot more context. Taste builds over time through lots of unit tests, through lots of content writing, through an accumulation of product decisions. It’s hard to put it in the individual spec, but it can be teased out of 100 project specs. And when you get to that scale the AI starts to do it pretty well.

Gosper 18 minutes ago

Language count is a decent notoriety signal though pretty coarse. The OP/author should take a look at QRank: https://qrank.toolforge.org/

> QRank is a ranking signal for Wikidata entities. It gets computed by aggregating page view statistics for Wikipedia, Wikitravel, Wikibooks, Wikispecies and other Wikimedia projects

from https://github.com/brawer/wikidata-qrank/blob/main/doc/desig...

TimXare 31 minutes ago

Taste is mostly the part of the spec you forgot to write down, plus the part you couldn't write down even if you tried.

chantepierre 57 minutes ago

It makes me smile when runners use "X is a marathon, not a sprint" to hint at an effort that accumulates over time and an optimal use of energy.

I do it too because it's a common expression, and a marathon is of course longer than a sprint, but both have in common that properly raced, they are absolutely brutal efforts that leave you without a single additional drop at the end. The effort length and instantaneous power output changes, of course. Maybe "it's a marathon build, not the race" would be more precise at the loss of nearly all its expressive power (but with a lot more pedanticism points) :-p .

Nice project !

another-dave 53 minutes ago

"The effort length and instantaneous power output changes, of course."

but that's what the phrase is meant to convey, right?

Don't run through consumable X (energy/money/etc) like there's no tomorrow - even though there's <some big important milestone> now, we've got dozens more of those that we need to meet, so you're better off getting this one done at 75% than committing 100% to it and failing on all the others.

chantepierre 39 minutes ago

Yeah you're right, I hear it more like "this is a week long hike, not a sprint" as if a marathon included rest. In any length of racing there's no tomorrow. But I'm doing tongue-in-cheek pedanticness here and will stop that right now !

dasil003 6 minutes ago

I'd wager that if a manager says that they want you to take it more like a real marathon and less like long hike.

boredumb 47 minutes ago

Don't work 12 hour days to get milestone X out, because there are dozens more milestones so don't get burnt on trying to get this one out yesterday. It would probably be more like, don't use 200% to get this out and then quit or burn yourself to 0% or a few % in a year when we want you to extend and maintain this stuff.

a_c 34 minutes ago

I like to think of testing as making sure things not wrong, but not making it right.

Working, useful, delightful, in that order. Testing can make things more likely to work, that's it.

carra 9 minutes ago

So now we need a framework for unit tastes

esafak 25 minutes ago

We can encode taste -- generative AI depends on it. In the worst case, ask people to compare two examples and pick the one with better taste. You can even ask them to rate multiple subjective criteria at once. Use that to learn a scoring function based on the rating labels, and raw features.

throw93949444 1 hour ago

> For example, my native Iceland had a nice mix of nature, historical sites and populated places.

You absolutely can unit test for taste, just put an agent into loop, and write into prompt what you like. Then do scoring...

Iceland is really bad example, it basically has one populated site (capital) and circular road that goes around the island.