|
|
|
|
|
by olsgaarddk
2492 days ago
|
|
That was my initial goal, but I had a lot of trouble with vanilla MeCab not understanding a lot of the text. But this was before neologd, so i think it would work better now. I don’t have the source code on me, but I scraped it from a website that publishes subtitles. The scraping was easy, the cleaning not, and I believe this spreadsheet is generated from my first attempt at cleaning. A lot of sources in Japanese nlp and linguistics have a bad habit of changing url often, so it bitrots easily. Sorry. |
|