Hacker News new | ask | show | jobs
by jankovicsandras 507 days ago
This is a good intro to text search. Shameless plug: If you throw in a bit more, ca. 250 SLOC, you can have BM25 search: https://github.com/jankovicsandras/bm25opt
2 comments

You can probably have phrase matching in a hundred lines more, maybe less.

Most of the difficulty in search is dealing with the sheer volume of data. The algorithms themselves are pretty trivial for the most part.

Mine: bm25 I use for teaching (sorry for the French example)

https://gist.github.com/benob/69d48421f88f5dcc2b26a204d3251d...

Merci! Mais faut-il s'en excuser? L'exemple est en français, voilà tout!
This is an English-speaking forum, so yes, it is expected to use English and link to other stuff in English (or provide a translation). It doesn't mean that French (or any other language) is a bad thing, it just isn't the language used by this community.
Sure... but for the record, the code that the parent posted is in Python -- one of the global tech economy's pre-eminent programming languages for more than a decade. The parent's example contains some variables with Unicode values that are French words. It strikes me as odd that the parent would say sorry at all. Maybe the parent has a better understanding than I of linguistic sensitivities.