| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lou1306 72 days ago
	They're searching for multiple substrings in a single pass, regexes are the optimal solution for that.

2 comments

noosphr 72 days ago

The issue isn't that regex are a solution to find a substring. The issue is that you shouldn't be looking for substrings in the first place.

This has buttbuttin energy. Welcome to the 80s I guess.

link

lou1306 72 days ago

> The issue is that you shouldn't be looking for substrings in the first place.

Why? They clearly just want to log conversations that are likely to display extreme user frustration with minimal overhead. They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

link

noosphr 72 days ago

>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

The only time to use a regex is when searching with a human in the loop. All other uses are better handled some other way.

>They could do a full-blown NLP-driven sentiment analysis on every prompt but I reckon it would not be as cost-effective as this.

Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.

link

lou1306 72 days ago

> Every conversation is sent to an llm at least a thousand times the size of gpt2 which could one shot this nearly a decade ago.

Yes, but that is _what the product does_. What we are talking about is _telemetry_.

link

8cvor6j844qw_d6 72 days ago

Very likely vibe coded.

I've seen Claude Code went with a regex approach for a similar sentiment-related task.

link

mr_00ff00 72 days ago

My understanding of vibe coding is when someone doesn’t look at the code and just uses prompts until the app “looks and acts” correct.

I doubt you are making regex and not looking at it, even if it was AI generated.

Clbuttic!

It's fast, but it'll miss a ton of cases. This feels like it would be better served by a prompt instruction, or an additional tiny neural network.

And some of the entries are too short and will create false positives. It'll match the word "offset" ("ffs"), for example. EDIT: no it won't, I missed the \b. Still sounds weird to me.

link

hk__2 72 days ago

It’s fast and it matches 80% of the cases. There’s no point in overengineering it.

link

NitpickLawyer 72 days ago

> There’s no point in overengineering it.

I swear this whole thread about regexes is just fake rage at something, and I bet it'd be reversed had they used something heavier (omg, look they're using an LLM call where a simple regex would have worked, lul)...

link

vharuck 72 days ago

The pattern only matches if both ends are word boundaries. So "diffs" won't match, but "Oh, ffs!" will. It's also why they had to use the pattern "shit(ty|tiest)" instead of just "shit".

link

BoppreH 72 days ago

You're right, I missed the \b's. Thanks for the correction.

link