Rather than parsing HTML, it might be easier to look for news organizations that publish an RSS feed and transform that into something that is more accessible and internationalized.
I appreciate the reply. I actually do parse RSS from the main page if it exists. However, not all sites have RSS feeds and those that do lack other interesting information (author, etc). I definitely had that thought earlier on but my focus was on other things.