Hacker News new | ask | show | jobs
by colordrops 1448 days ago
I don't trust regular expressions that I wrote, let alone some doped up parameter sniffing AI.
4 comments

You can always use tools like Regex101[0] to verify if they actually work or not. I have tried a few generated by the AI, and it seems to do the job most of the time.

[0]: https://regex101.com/

You still could have edge cases you don't want or want
if you have an edge case you know you want, you could add the description into the input of the AI.

If you are afraid of unintended matches, that's a different problem, which you might also get writing the regex yourself!

The solution, i reckon, is to create (may be even via the same AI?) a large list of matches, and you manually look thru to see if there's unintended matches.

or just if clause it - why spend 1 hour chasing down regex to save yourself from writing 3 lines of code.
Because application performance matters more than developer performance.
This varies a lot.

Reading a 100mb file 10 times a day vs. 100 times a day is perfectly acceptable in many scenarios.

I thought Regex was typically less performant than string operations?
That's true, but I think it's important to take the encryption approach, I understand how it works vaguely and hope I don't get burned.
I think the worst case here is it writing a regex that mostly works but fails for some edge cases that you don't think to test but will encounter in production.
It'd be cool if it split out a bunch of test cases/examples so you can see what's happening in edge cases.
That's always an issue with regard regardless of who wrote it
I disagree. If I write a regexp, I have to think about what I am writing.

If I press a button and one is magically made for me, I can skip that step.

Skipping that step is bad.

Do you feel the same way about linked lists, hash functions, garbage collectors, etc?
Implementations autogenerated by an AI? Absolutely, there are so many edge cases to consider.

However, I'm much more likely to trust a popular library implementation.

OK, but back to the regex. That's just pattern matching, and ML/AI has been shown to be amazing at pattern matching (albeit underwhelming at most other tasks). I would trust an AI/ML generated regex, but only because such structures are easily testable. This tool from the University of Trieste has probably been around for 10+ years--probably only 1e3 parameters, not 1e10.

http://regex.inginf.units.it/

It's a form of defense in depth. Ideally your application has specific test cases, property-based testing, linting, and whatever other forms of static and dynamic analysis you can think of. But if your code is obfuscated and/or you don't have a clear mental model of what it does, that adds a layer of uncertainty and could potentially hurt debugability.
Linked lists, hash functions, etc are mostly solved problems with clearly defined interfaces, built and tested for edge cases by humans. Each regex is a special snowflake.
Totally! And this is one of the worst kinds of code to generate with AI given how often regexes are write-only code. Personally, for any important regex I'm either going to have good unit tests, an extended-mode regex with comments, or both. Which I'm sure this AI is not going to do.

So to me this mainly looks like a way for people who don't understand something to put that ignorance into the codebase, setting traps for colleagues down the road. That's not a new experience for me, but this does seem likely to make that easier and more fun, two things I don't think dangerous code needs.

fair enough, but few things are easier to test than a regex.
Few things are easier to miss than a string that breaks your (nontrivial) regex.

I’d like to see a mathematical estimate of the number of test strings I should generate given some input regex.

If you cannot enumerate the test cases, then the problem is too complex for a single regex. It's sort of self limiting.