Hacker News new | ask | show | jobs
by bhickey 1040 days ago
Both papers use the phrase "regular expressions" and there the resemblance ends. The linked manuscript uses regular expression to realize a grammar and then memoizes logic masks. I want to know why FlashText failed to cite:

Baeza-Yates, Ricardo A., and Gaston H. Gonnet. "Fast text searching for regular expressions or automaton searching on tries." Journal of the ACM (JACM) 43.6 (1996): 915-936.

Eltabakh, Mohamed Y., Ramy Eltarras, and Walid G. Aref. "To trie or not to trie? realizing space-partitioning trees inside postgresql: Challenges, experiences and performance." (2005).

Zhang, Yijun, and Lizhen Xu. "An algorithm for url routing based on trie structure." 2015 12th Web Information System and Application Conference (WISA). IEEE, 2015.

1 comments

Your comment here doesn’t feel like it’s in good faith, but there’s a good chance I’m misreading it.
I'm serious that the similarities between the papers are superficial.

I don't think it's fair of you to criticize the authors for not citing some obscure preprint, when that manuscript itself neglected to cite decades of prior, relevant work.

I have some other comment on this thread where I point out why I don’t think it’s superficial. Would love to get your feedback on that if you feel like spending more time on this thread.

But it’s not obscure? FlashText was a somewhat popular paper at the time (2017) with a popular repo (https://github.com/vi3k6i5/flashtext). Their paper was pretty derivative of Aho-Corasick, which they cited. If you think they genuinely fucked up, leave an issue on their repo (I’m, maybe to your surprise lol, not the author).

Anyway, I’m not a fan of the whatabboutery here. I don’t think OG’s paper is up to snuff on its lit review - do you?

> I don’t think OG’s paper is up to snuff on its lit review - do you?

Not in the slightest. Caching the logit masks and applying the right one based on where you are in your grammar is obvious. This is what I'd expect some bright undergrads to come up with for a class project. This manuscript could've been a blog post.

Although arXiv is displacing some traditional publishing, I think it's a little silly to try to hold it to the same standards.

I saw your argument for why you think it's relevant and I think you're overstating the case. There are a _heap_ of papers they could've cited.

As an aside, when can we stop citing _Attention is All You Need_?