| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AutoJanitor 116 days ago

Honest Limitations

    819K parameters. Responses are short and sometimes odd. That's expected at this scale with a small training corpus. The achievement is that it runs at all on this hardware.
    Context window is 64 tokens. Prompt + response must fit in 64 bytes.
    No memory between dialogs. The KV cache resets each conversation.
    Byte-level vocabulary. The model generates one ASCII character at a time.

Future Directions

These are things we're working toward — not current functionality:

    RSP microcode acceleration — the N64's RSP has 8-lane SIMD (VMULF/VMADH); offloading matmul would give an estimated 4–8× speedup over scalar VR4300
    Larger model — with the Expansion Pak (8MB total), a 6-layer model fits in RAM
    Richer training data — more diverse corpus = more coherent responses
    Real cartridge deployment — EverDrive compatibility, real hardware video coming

Why This Is Real

The VR4300 was designed for game physics, not transformer inference. Getting Q8.7 fixed-point attention, FFN, and softmax running stably at 93MHz required:

    Custom fixed-point softmax (bit-shift exponential to avoid overflow)
    Q8.7 accumulator arithmetic with saturation guards
    Soft-float compilation flag for float16 block scale decode
    Alignment-safe weight pointer arithmetic for the ROM DFS filesystem

The inference code is in nano_gpt.c. The training script is train_sophia_v5.py. Build it yourself and verify.