|
|
|
|
|
by Kelamir
1104 days ago
|
|
> We start by parsing documents into chunks. A sensible default is to chunk documents by token length, typically 1,500 to 3,000 tokens per chunk. However, I found that this didn’t work very well. A better approach might be to chunk by paragraphs (e.g., split on \n\n). Hmm good insight there. I've done some experimenting formerly by chunk length and it's been pretty troublesome due to missing context. |
|