Hacker News new | ask | show | jobs
by janstice 1422 days ago
I spent a decade parsing text-with-angle-brackets with regexes, and it sucks. It’s always tempting to try an html parser but if the code is written by a human (or worse, a mixture of human and machine, especially if the machine involves MS Word) it just doesn’t work.

I’d suggest rather than attempting to do big regexes that capture a bunch of stuff in one call, break it down to a bunch of smaller, more targeted calls - one call to capture the text of the whole record, another with 3 variants to get the title, another with 2 variants to pick up a tag line, etc.

1 comments

Essentially, this is what I do. First matching with a broader regex ruleset, working down to next one and so on and so forth. But with more complexity of code comes more breakage down the line. I went in full maze mode yesterday and questioned everything after thtat, so this is what my sanity looked like this morning.

Regex isn't really the problem though (even though it technically should also not be the solution in this case, but I cannot dictate the techstack). It was just the last drop on my frustration with the situation and myself not being able to do, what my colleague does, even though I want to. I felt the need for help, and I got it. Awesome community around here.