| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lukebechtel 378 days ago

> H-Net demonstrates three important results on language modeling:

> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.

> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.

> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.

1 comments

lukebechtel 378 days ago

https://arxiv.org/pdf/2507.07955

paper

link