Show HN: RetNPhi – Phi-3.5 as a Byte-Level LM with RetNet-Inspired Efficiency

Y	Hacker News new \| ask \| show \| jobs

Show HN: RetNPhi – Phi-3.5 as a Byte-Level LM with RetNet-Inspired Efficiency (github.com)

2 points by JosefAlbers 697 days ago

I've been experimenting with transforming Microsoft's Phi-3.5 into a byte-level language model with RetNet-inspired elements. The result is RetNPhi, a hybrid model that combines the strengths of Phi-3.5 with the efficiency of RetNet.

Key features:

- Byte-level processing for universal file type handling

- RetNet's multi-scale exponential decay and group normalization for efficient long-range dependency modeling

- Recurrent inference mode with constant memory usage, regardless of sequence length

- Minimal fine-tuning: only post-layer norms, first token embedding layer, and LoRA on self-attention output projections (o_proj) are adjusted

- Surprisingly coherent output after training on just 64 lines of Tiny Shakespeare

Technical details:

- Based on Microsoft's Phi-3.5 architecture

- Implements RetNet's retention mechanism

- Uses LoRA for efficient adaptation of pretrained weights

- Dual-mode processing: parallel for training, recurrent for inference

Sample output (input: "first citi"):

zen: you are all resolved rather to die than to fam

This approach could lead to more efficient, locally-runnable language models. The byte-level processing opens up interesting possibilities for handling various data types, while the recurrent inference mode could be a game-changer for running these models on consumer-grade hardware.

I'm particularly interested in feedback on:

1. Potential applications for a byte-level LM with efficient long-context handling

2. Thoughts on the hybridization of Transformer-based models (like Phi) with RetNet concepts

3. Ideas for further optimizing the model for local deployment

GitHub: https://github.com/JosefAlbers/Phi-3-Vision-MLX/blob/main/as...