| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by liteclient 193 days ago

it makes sense architecturally

they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute

that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far

2 comments

reactordev 193 days ago

Yup, keyword here is “under the right conditions”.

This may work well for their use case but fail horribly in others without further peer review and testing.

link

tuned 192 days ago

no, from my point of view is being more domain-focused instead of going full-orthogonal.

link

tuned 192 days ago

right. this is a proposal that needs to be tested. I started testing it on 30M parameters then I will move to a 100M and evaluate the generation on domain-specific assisting tasks

link