Hacker News new | ask | show | jobs
by magundu 839 days ago
What is content aware chunking?
1 comments

Using content appropriate delimiters to create the chunks. If its a Python document, split it at the proper delimiters. If its an HTML document, JSON, etc. We wouldn't want to split a chunk in the middle of a function or paragraph. Or maybe we would. But that is what OP is refering to I believe.
That's a good point, and documents too needs different chunking techniques. You don't want to split a word file the same way you split an excel...