|
|
|
|
|
by jmorgan
517 days ago
|
|
Phi-4's architecture changed slightly from Phi-3.5 (it no longer uses a sliding window of 2,048 tokens [1]), causing a change in the hyperparameters (and ultimately an error at inference time for some published GGUF files on Hugging Face, since the same architecture name/identifier was re-used between the two models). For the Phi-4 uploaded to Ollama, the hyperparameters were set to avoid the error. The error should stop occurring in the next version of Ollama [2] for imported GGUF files as well In retrospect, a new architecture name should probably have been used entirely, instead of re-using "phi3". [1] https://arxiv.org/html/2412.08905v1 [2] https://github.com/ollama/ollama/releases/tag/v0.5.5 |
|