|
|
|
|
|
by zackangelo
778 days ago
|
|
The table at the bottom says they initialized the 65K version from "LLaMA-3 7B"? (Assuming the 7B is a typo and they meant 8B.) And each successive version with a larger window was initialized on the previous smaller one (65K -> 262K -> 524k -> 1048k). |
|