| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tadkar 1964 days ago

There is a similar great project here [1] with the Hungarian Wikipedia corpus. Great workout for non English and maybe non-ascii operations.

The performance of Java there is super impressive. It should port relatively quickly to this file too...

1 comments

Great, it would be nice if Java was included by the op