|
|
|
|
|
by perlgeek
4279 days ago
|
|
What I'd really like to see is code that takes a body of text and extracts parts that are written in another language. That's quite common, like in mixed-language IRC channels, quotes from English documents in documents mostly written in another language, and so on. And stemming and indexing such a document for full text search usually gives crappy results. (Bonus points of detecting programming code samples, so that this part isn't stemmed at all). |
|