If you start processing web articles on the scale of millions you'll be surprised by how creative people can be. Not talking about tweets, just news and blog articles.
Not surprised, just not relevant. The criteria here is “you can get pretty good results”, not “you must be able to process millions of articles without failure”.