Hacker News new | ask | show | jobs
by anima-core 187 days ago
A few clarifications, since most of the points here come from asking LLMs to summarize the repo rather than running the code directly.

1. The teacher only runs during field extraction. That step is offline. Once the fields are saved, the transformer is no longer needed. The student training and student-only inference scripts do not load the teacher at all. Compression refers to the field representation and the student head, not the extraction pass.

2. The HellaSwag file is a placeholder, not a required part of the method. It's included so the structure mirrors the paper’s tasks, and it points to the description in the text. The core experiments (RTE, SST-2, CIFAR-10 intention probe, etc.) all have complete working code paths.

3. The AN1 head is intentionally simple. Linear probes are the baseline way to test whether compressed intermediate representations preserve structure. The key result is how much task-relevant geometry survives in a low-rank field. The novelty is in the compression behavior, not in inventing a new classifier architecture.

4. The student model exists and is trained independently of the teacher. This is what produces the classification results in the paper. The student doesn't call the teacher during inference, which is exactly the point.

5. DistilBERT’s SST-2 score isn’t the relevant comparison. The experiment isn’t “beat a small transformer.” It’s “how far can a 256-dimensional compressed field distilled from a frozen 70B model get on a downstream task?” The result speaks to representational compression, not leaderboard performance.

6. The 2 tok/s number is for the specific configuration used in the economic section. Different hardware, precision modes, and serving stacks vary by an order of magnitude. The point was to illustrate cost scaling, not claim a universal throughput ceiling.

If there’s a specific part of the implementation you believe contradicts the paper, feel free to point to the line and we can discuss that human to human. The repo is small by design, so everything is easy to check directly without relying on LLM summaries.