|
|
|
|
|
by yareally
4841 days ago
|
|
> because the wiki syntax is extremely convoluted and there is no formal spec I ran into that when parsing out pages with Python for an app I am working on. Parsing it by conditions leads to a lot of conditions for edge cases, which as one might think happen more often as the more obscure the topic gets due to not being updated or improved to be more inline with the formatting of trafficked articles. If you are looking for something in particular, ranking elements on a page helps to a point if the elements you want are the ones that occur the most or near to it. Aside from more obscure, less trafficked articles, I noticed many of the Non-English wiki articles are also formatted in awkward ways and appear far less updated to their English counterparts. I thought I had most edges cases covered until I started parsing out wiki markup for other languages. |
|