|
|
|
|
|
by thethirdone
62 days ago
|
|
> the ratio remains approximately 914x over TurboQuant, with compression improving rather than degrading as context length grows. This line from the abstract got me really suspicious. Obviously a compression scheme that incorporates the entire sequence shouldn't get worse compared to a per element one as the length increases. It is important to note that this paper is PURELY theoretical. I couldn't find much meat on the bone from a quick skim. The single author, Gregory Magarshak, has only published one paper on arxiv before and appears to be a professor of business / music. I don't plan to give it more of a read hoping for something of value. |
|
The author is not an ML researcher but rather an AI startup CTO / founder. Previously worked on “social operating systems” for the web, blockchain of course. And now an AI innovator. I’m suspicious. This was part of the author’s reply in another thread:
> When TurboQuant came out, I realized we can also go way below the Shannon limit in the same way, and take advantage of PLT. In fact, I'm working on publishing a paper that generalizes this to robotics (which needs to do cheap fast on-board inference "in the field"). I also believe this is how animals actually learn. In other words, over time they learn overall "sequences" of actions and then can check whether they are "good enough" to solve the problem, or whether to switch to a full analysis -- this corresponds to System 1 and 2 of Daniel Kahneman's "Thinking Fast and Slow".
Which doesn’t exactly inspire confidence and makes me wonder who they think their audience is. ML researchers or LinkedIn.