|
|
|
|
|
by simonw
582 days ago
|
|
Would it make sense for this to offer a chunking strategy that doesn't need a tokenizer at all? I love the goal to keep it small, but "tokenizers" is still a pretty huge dependency (and one that isn't currently compatible with Python 3.13). I've been hoping to find an ultra light-weight chunking library that can do things like very simple regex-based sentence/paragraph/markdown-aware chunking with minimal additional dependencies. |
|
The more complicated stuff is the effective bin-packing problem that emerges depending on how much different contextual sources you have.