In the Tagalog file, } is near the top but { is over 8,000 lines down. Is there a reason they have such different frequencies? ( and ) are right next to each other.
And yes I realize this is a really odd question :)
Oh true. I tried to clean up Wiki markup for ML years ago and it was a huge pain. Next time I think I'll parse the HTML version and pull out the text from the tags explicitly.
This is a much better way to do it. It's easier, cleaner, and gets the text which is generated by templates, which there is a surprising amount of (you get weird artifacts from that otherwise).