|
|
|
|
|
by miven
291 days ago
|
|
The ARC Prize Foundation ran extensive ablations on HRM for their slew of reasoning tasks and noted that the "hierarchical" part of their architecture is not much more impactful than a vanilla transformer of the same size with no extra hyperparameter tuning: https://arcprize.org/blog/hrm-analysis#analyzing-hrms-contri... |
|