| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colordrops 1448 days ago
	I don't trust regular expressions that I wrote, let alone some doped up parameter sniffing AI.

4 comments

memorable 1448 days ago

You can always use tools like Regex101[0] to verify if they actually work or not. I have tried a few generated by the AI, and it seems to do the job most of the time.

[0]: https://regex101.com/

link

croes 1448 days ago

You still could have edge cases you don't want or want

link

chii 1448 days ago

if you have an edge case you know you want, you could add the description into the input of the AI.

If you are afraid of unintended matches, that's a different problem, which you might also get writing the regex yourself!

The solution, i reckon, is to create (may be even via the same AI?) a large list of matches, and you manually look thru to see if there's unintended matches.

link

carvking 1448 days ago

or just if clause it - why spend 1 hour chasing down regex to save yourself from writing 3 lines of code.

link

eurasiantiger 1448 days ago

Because application performance matters more than developer performance.

link

carvking 1448 days ago

This varies a lot.

Reading a 100mb file 10 times a day vs. 100 times a day is perfectly acceptable in many scenarios.

link

episteme 1448 days ago

I thought Regex was typically less performant than string operations?

link

wowokay 1448 days ago

That's true, but I think it's important to take the encryption approach, I understand how it works vaguely and hope I don't get burned.

link

mysterydip 1448 days ago

I think the worst case here is it writing a regex that mostly works but fails for some edge cases that you don't think to test but will encounter in production.

link

treis 1448 days ago

It'd be cool if it split out a bunch of test cases/examples so you can see what's happening in edge cases.

link

celticninja 1448 days ago

That's always an issue with regard regardless of who wrote it

link

mkoryak 1448 days ago

I disagree. If I write a regexp, I have to think about what I am writing.

If I press a button and one is magically made for me, I can skip that step.

Skipping that step is bad.

link

funstuff007 1448 days ago

Do you feel the same way about linked lists, hash functions, garbage collectors, etc?

link

Retr0id 1448 days ago

Implementations autogenerated by an AI? Absolutely, there are so many edge cases to consider.

However, I'm much more likely to trust a popular library implementation.

link

funstuff007 1448 days ago

OK, but back to the regex. That's just pattern matching, and ML/AI has been shown to be amazing at pattern matching (albeit underwhelming at most other tasks). I would trust an AI/ML generated regex, but only because such structures are easily testable. This tool from the University of Trieste has probably been around for 10+ years--probably only 1e3 parameters, not 1e10.

http://regex.inginf.units.it/

link

nerdponx 1448 days ago

It's a form of defense in depth. Ideally your application has specific test cases, property-based testing, linting, and whatever other forms of static and dynamic analysis you can think of. But if your code is obfuscated and/or you don't have a clear mental model of what it does, that adds a layer of uncertainty and could potentially hurt debugability.

link

colordrops 1448 days ago

Linked lists, hash functions, etc are mostly solved problems with clearly defined interfaces, built and tested for edge cases by humans. Each regex is a special snowflake.

link

wpietri 1448 days ago

Totally! And this is one of the worst kinds of code to generate with AI given how often regexes are write-only code. Personally, for any important regex I'm either going to have good unit tests, an extended-mode regex with comments, or both. Which I'm sure this AI is not going to do.

So to me this mainly looks like a way for people who don't understand something to put that ignorance into the codebase, setting traps for colleagues down the road. That's not a new experience for me, but this does seem likely to make that easier and more fun, two things I don't think dangerous code needs.

link

funstuff007 1448 days ago

fair enough, but few things are easier to test than a regex.

link

russfink 1448 days ago

Few things are easier to miss than a string that breaks your (nontrivial) regex.

I’d like to see a mathematical estimate of the number of test strings I should generate given some input regex.

link

funstuff007 1447 days ago

If you cannot enumerate the test cases, then the problem is too complex for a single regex. It's sort of self limiting.

link