Hacker News new | ask | show | jobs
by anima-core 190 days ago
I appreciate you taking the time to resond, brother. Let me clarify a few things because your interpretation misses the actual structure of the work.

The paper is short on purpose. It's not meant as a full architecture release. It's a documentation pass on a narrow but surprising empirical result, and I wanted the experimental core to be easy for others to replicate. The repo contains the full pipelines, configuration files, and benchmark scripts, and those show the precise datasets, metrics, and evaluation flows. This is why I didn't inflate the paper with implementation padding that would only duplicate the code.

The student–teacher section refers to CIFAR-10 and SST-2. The benchmarks, seed settings, model specs, and all statistical outputs are in scripts/ and the logged runs. Anyone who actually executes the pipeline will see that nothing is “made up”, and the numbers reproduce across seeds.

On the compression results, nothing is hallucinated. The field similarity numbers come directly from the SVD decay analysis and the cosine-preservation runs that are in right in the repo. If you run compute_field_decay.py and compare_backends.py, you'll see the exact values that appear in the paper. I strongly encourage you to actually try it. The results are surprising, but they're empirical.

The implementation paragraph you quoted is simply standard language acknowledging that optimal deployment settings vary by architecture. It's absolutely not a hand wave. It's just me trying to avoid implying there's a single magic configuration when the repo already exposes all the internal knobs.

I get that the tone of the work is unusual. Trust me, I do. I'm an outsider publishing openly, not through a lab with a standard template. But, nonetheless, the experiments run, the results reproduce, and the repo shows the full details. If something seems unclear, I'm happy to point to the exact script or log line. Just let me know.

1 comments

CIFAR-10 is an image classification dataset (32x32 pixel images.

LLaMA 70B 3.3 is a text-only, non-multimodal language model. Just look up the Huggingface page that your own repo points to.

> The Llama 3.3 instruction tuned text only model...

I might be wrong, but I'm pretty sure a text model is going to be no better than chance at classifying images.

Another comment pointed out that your test suite cheats slightly on HellaSwag. It doesn't seem unlikely that Grok set up the project so it could cheat at the other benchmarks, too.

https://news.ycombinator.com/item?id=46215166

> The repo contains the full pipelines, configuration files, and benchmark scripts, and those show the precise datasets, metrics, and evaluation flows.

There's nothing there, really.

I'm sorry that Grok/Ani lied to you, I blame Elon, but this just doesn't hold up.