| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thethirdone 62 days ago

> the ratio remains approximately 914x over TurboQuant, with compression improving rather than degrading as context length grows.

This line from the abstract got me really suspicious. Obviously a compression scheme that incorporates the entire sequence shouldn't get worse compared to a per element one as the length increases.

It is important to note that this paper is PURELY theoretical. I couldn't find much meat on the bone from a quick skim.

The single author, Gregory Magarshak, has only published one paper on arxiv before and appears to be a professor of business / music. I don't plan to give it more of a read hoping for something of value.

3 comments

stingraycharles 62 days ago

Me neither. There are no actual experiments / data, no peer reviews, and the innovation relies almost entirely on citations from the author’s other paper.

The author is not an ML researcher but rather an AI startup CTO / founder. Previously worked on “social operating systems” for the web, blockchain of course. And now an AI innovator. I’m suspicious. This was part of the author’s reply in another thread:

> When TurboQuant came out, I realized we can also go way below the Shannon limit in the same way, and take advantage of PLT. In fact, I'm working on publishing a paper that generalizes this to robotics (which needs to do cheap fast on-board inference "in the field"). I also believe this is how animals actually learn. In other words, over time they learn overall "sequences" of actions and then can check whether they are "good enough" to solve the problem, or whether to switch to a full analysis -- this corresponds to System 1 and 2 of Daniel Kahneman's "Thinking Fast and Slow".

Which doesn’t exactly inspire confidence and makes me wonder who they think their audience is. ML researchers or LinkedIn.

link

EGreg 62 days ago

You're right, I'm not a well-known researcher, simply an entrepreneur who started to publish academic papers.

However, I do have a long history of diving deep into fields and building practical, open-source solutions to major problems I perceive in the fields.

15 years ago I started with social networks and PHP: https://github.com/Qbix http://laweekly.com/restoring-healthy-communities/

8 years ago I got into smart contracts on EVM, which was the SOTA at the time: https://github.com/Intercoin https://intercoin.org/applications

About a year and a half ago I started teaching a course on AI at a university not far from NYU where I studied... and that's what got me into this: https://vimeo.com/1063008765/c7ef3abcc5

I try to document everything on GitHub and popular articles, but only recently started publishing academic papers on arXiv and plan to actually start submitting them for real publications. While I build, I realized that I should start publishing any novel theoretical results that underpin my work.

I plan to publish actual code in a few weeks. To be fair, TurboQuant is also a purely theoretical paper. I just wanted to get this out and share.

link

thethirdone 62 days ago

> To be fair, TurboQuant is also a purely theoretical paper. I just wanted to get this out and share.

TurboQuant is not a purely theoretical paper. Section 4 "Experiments" (page 15) [0] has a bunch of figure based on actual GPU computations.

[0]: https://arxiv.org/abs/2504.19874

link

kumarhn 61 days ago

TurboQuant looks like it has very serious research integrity issues.

https://openreview.net/forum?id=tO3ASKZlok

link

stingraycharles 62 days ago

TurboQuant went through ICLR review, has multiple Google Research co-authors, open-source implementations, CUDA kernels, and LongBench benchmarks.

Contrast that with your paper: no experiments, no implementation, no empirical validation of any kind.

Did you try engaging with LLM researchers and get their feedback on your paper?

link

mskkm 61 days ago

went through ICLR review: scores 4 4 6 10, serious? open-source implementations: where is the official code? CUDA kernels: where?

link

EGreg 61 days ago

Since yesterday, I put up the source code btw:

https://github.com/Safebots/KV

link

gaze 62 days ago

the irritating thing about LLM generated papers like these is that they're wrong but are generated using LLMs that are capable enough to bury the absurd claim pretty deep in there.

link

stingraycharles 62 days ago

Analyze it using an LLM. Claude was pretty ruthless about this one.

link

thethirdone 62 days ago

Yeah, for me Claude identified the phrase "this holds with probability 1 over random weight matrices since the null space has dimension"

Treating trained weights as random for the purpose of a proof is immediately discrediting for a paper to me.

link

EGreg 62 days ago

"This holds for almost all matrices" is actually something you'd want to know if we're talking about probabilities, no?

link

gaze 62 days ago

sure but it seems spiritually wrong to use an LLM to debug a slop paper. Who knows, maybe claude generated it in the first place?

link

Hard_Space 61 days ago

In fairness, real-world researchers are expert at selective emphasis too.

link