|
|
|
|
|
by SteveJS
157 days ago
|
|
This gets those cases right. https://github.com/KnowSeams/KnowSeams (On a beefy machine) It gets 1 TB/s throughput including all IO and position mapping back to original text location. I used it to split project gutenberg novels. It does 20k+ novels in about 7 seconds. Note it keeps all dialog together- which may not be what others want, but was what i wanted. |
|