|
|
|
|
|
by neuralkoi
511 days ago
|
|
Thank you! I was looking for how to do this. The example in the issue above shows how to increase the context size in ollama: $ ollama run llama3.2
>>> /set parameter num_ctx 32768
Set parameter 'num_ctx' to '32768'
>>> /save llama3.2-32k
Created new model 'llama3.2-32k'
>>> /bye
$ ollama run llama3.2-32k "Summarize this file: $(cat README.md)"
...
The table in the reddit post above also shows context size vs memory requirements for Model: 01-ai/Yi-34B-200K
Params: 34.395B
Mode: infer Sequence Length vs Bit Precision Memory Requirements
SL / BP | 4 | 6 | 8 | 16
--------------------------------------------------------------
256 | 16.0GB | 24.0GB | 32.1GB | 64.1GB
512 | 16.0GB | 24.1GB | 32.1GB | 64.2GB
1024 | 16.1GB | 24.1GB | 32.2GB | 64.3GB
2048 | 16.1GB | 24.2GB | 32.3GB | 64.5GB
4096 | 16.3GB | 24.4GB | 32.5GB | 65.0GB
8192 | 16.5GB | 24.7GB | 33.0GB | 65.9GB
16384 | 17.0GB | 25.4GB | 33.9GB | 67.8GB
32768 | 17.9GB | 26.8GB | 35.8GB | 71.6GB
65536 | 19.8GB | 29.6GB | 39.5GB | 79.1GB
131072 | 23.5GB | 35.3GB | 47.0GB | 94.1GB
* 200000 | 27.5GB | 41.2GB | 54.9GB | 109.8GB
* Model Max Context Size
Code: https://gist.github.com/lapp0/d28931ebc9f59838800faa7c73e3a0... |
|