| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by neilkod 5394 days ago

You are all 100% correct about the regex. Before converting the product-identifying matching code to Python, I did it in bash using grep -iw to match whole words.

for i in newton macintosh macbook ibook iie mac iphone ipod imac ipad II+ iigs LaserWriter osx 'apple ?tv' itunes '\]\[' imovie do stevejobs_tribute.txt |wc -l`" echo "$i: `egrep -wi "${i}s?" $INPUTFILE|wc -l`" done

But this was difficult to maintain. I wanted the ability to print a 'friendly' looking product name (the dict's key) and maintain the counts in a variable.

When I made the move from bash to python, I knew that there would be some overlap when I pushed this code (in the name of shipping!). I need to split the sentences into proper tokens and then check each token for a product match. I'm already splitting the sentence into tokens for part-of-speech tagging so it shouldn't be difficult to do.

tl;dr known issue on the Mac regex, I needed to publish it and get back to work!

1 comments

ot 5394 days ago

You can use '\b' which matches "word boundaries", so the regex would be something like "\bmac\b".

link

neilkod 5394 days ago

Thanks, I'm going to update the code and re-run the numbers.

link