Hacker News new | ask | show | jobs
by exgrv 3034 days ago
This is probably due to our preprocessing of Wikipedia that did not get rid of all the '}' from the markup.
1 comments

Oh true. I tried to clean up Wiki markup for ML years ago and it was a huge pain. Next time I think I'll parse the HTML version and pull out the text from the tags explicitly.
This is a much better way to do it. It's easier, cleaner, and gets the text which is generated by templates, which there is a surprising amount of (you get weird artifacts from that otherwise).