Hacker News new | ask | show | jobs
by perlgeek 4279 days ago
What I'd really like to see is code that takes a body of text and extracts parts that are written in another language.

That's quite common, like in mixed-language IRC channels, quotes from English documents in documents mostly written in another language, and so on.

And stemming and indexing such a document for full text search usually gives crappy results.

(Bonus points of detecting programming code samples, so that this part isn't stemmed at all).

1 comments

That would be awesome :)