| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ga6840 3727 days ago

Thank you for informing me on my first two questions, so now I understand NTCIR's problem.

At very first I tried to compare my results (MAP, recall, precision) with participants in NTCIR, but I take a lot efforts to get dataset, after which I find I cannot convert MathML back into TeX very confidently, most importantly, my parser-generated tree structure is fine-tuned and very dependent on TeX input, I cannot just take MathML tree structure directly, I need much more efforts than just importing an existing XML parser. Because of these, I can not compare my results with mainstream NTCIR researchers. But I definitely tried very hard, sadly I give up. If NTCIR someday can provide (even if request is needed) TeX data for competition, I will consider to (and able to, willing to) compare my results with NTCIR participants (in order to "prove" it).

Writing a TeX parser only for math search is not that difficult, I have written it, it parses most user-created document on math.stackexchange.com. Although I cannot convince you I get better results, I can argue parsing search-interested TeX subset is effortless (if you only care math-related TeX), I even opensourced my search engine TeX parser. Again, problem is not that easy to grab a XML parser and reuse it in my project, I believe a good math-aware search engine needs to get a tree structure very different from that a MathML structure represents, you get a tree by reusing MWS praser, so WHAT? That tree is not the tree I want, I need a lot effort to convert it, the easy way for me is to convert MathML back into TeX (Since I have already done that from TeX), sadly it turns out to be too complicated to worth giving a shot.

1 comments

ga6840 3727 days ago

Lastly, I am more than childish to complain NTCIR and refuse submit a paper, I give up putting unworthy and duplicated effort on implementing a MathML parser that generates the expression tree I need (this step is the most difficult, rather than just parsing XML), instead, focusing on finding another conference to publish my efforts, it turns out my paper (a demo) get accepted in ECIR 2016, so glad I did not waste too much time on NTCIR, otherwise I would have missed ECIR.

link