Hacker News new | ask | show | jobs
by neilkod 5347 days ago
You are all 100% correct about the regex. Before converting the product-identifying matching code to Python, I did it in bash using grep -iw to match whole words.

for i in newton macintosh macbook ibook iie mac iphone ipod imac ipad II+ iigs LaserWriter osx 'apple ?tv' itunes '\]\[' imovie do stevejobs_tribute.txt |wc -l`" echo "$i: `egrep -wi "${i}s?" $INPUTFILE|wc -l`" done

But this was difficult to maintain. I wanted the ability to print a 'friendly' looking product name (the dict's key) and maintain the counts in a variable.

When I made the move from bash to python, I knew that there would be some overlap when I pushed this code (in the name of shipping!). I need to split the sentences into proper tokens and then check each token for a product match. I'm already splitting the sentence into tokens for part-of-speech tagging so it shouldn't be difficult to do.

tl;dr known issue on the Mac regex, I needed to publish it and get back to work!

1 comments

You can use '\b' which matches "word boundaries", so the regex would be something like "\bmac\b".
Thanks, I'm going to update the code and re-run the numbers.