|
|
|
|
|
by neilkod
5347 days ago
|
|
You are all 100% correct about the regex. Before converting the product-identifying matching code to Python, I did it in bash using grep -iw to match whole words. for i in newton macintosh macbook ibook iie mac iphone ipod imac ipad II+ iigs LaserWriter osx 'apple ?tv' itunes '\]\[' imovie
do
stevejobs_tribute.txt |wc -l`"
echo "$i: `egrep -wi "${i}s?" $INPUTFILE|wc -l`"
done But this was difficult to maintain. I wanted the ability to print a 'friendly' looking product name (the dict's key) and maintain the counts in a variable. When I made the move from bash to python, I knew that there would be some overlap when I pushed this code (in the name of shipping!). I need to split the sentences into proper tokens and then check each token for a product match. I'm already splitting the sentence into tokens for part-of-speech tagging so it shouldn't be difficult to do. tl;dr known issue on the Mac regex, I needed to publish it and get back to work! |
|