Hacker News new | ask | show | jobs
by boppo1 1678 days ago
I think it's case-by case. I once scraped a list of names and dates from individual Wikipedia pages. There were lots of formats like "1900-1950", "1900 - 1950", "(1900 - 1950)", "(1900 to 1950)", "1900 to cf. 1950", and so on. These were arbitrarily nested in the first couple sentences.

My thought was "Oh I think this is a job for that regex thing" and 35 minutes of googling syntax + a handful of passes later I had all the dates in a workable table. I have no idea how much code that would have taken. Albeit, I am a novice programmer.