|
|
|
|
|
by AutoJanitor
116 days ago
|
|
Honest Limitations 819K parameters. Responses are short and sometimes odd. That's expected at this scale with a small training corpus. The achievement is that it runs at all on this hardware.
Context window is 64 tokens. Prompt + response must fit in 64 bytes.
No memory between dialogs. The KV cache resets each conversation.
Byte-level vocabulary. The model generates one ASCII character at a time.
Future DirectionsThese are things we're working toward — not current functionality: RSP microcode acceleration — the N64's RSP has 8-lane SIMD (VMULF/VMADH); offloading matmul would give an estimated 4–8× speedup over scalar VR4300
Larger model — with the Expansion Pak (8MB total), a 6-layer model fits in RAM
Richer training data — more diverse corpus = more coherent responses
Real cartridge deployment — EverDrive compatibility, real hardware video coming
Why This Is RealThe VR4300 was designed for game physics, not transformer inference. Getting Q8.7 fixed-point attention, FFN, and softmax running stably at 93MHz required: Custom fixed-point softmax (bit-shift exponential to avoid overflow)
Q8.7 accumulator arithmetic with saturation guards
Soft-float compilation flag for float16 block scale decode
Alignment-safe weight pointer arithmetic for the ROM DFS filesystem
The inference code is in nano_gpt.c. The training script is train_sophia_v5.py. Build it yourself and verify. |
|