|
|
|
|
|
by umjunsik132
241 days ago
|
|
Hi HN Author here. I built FactorizedAttention - a new attention mechanism based on
the GWO framework. Instead of simple QK^T dot products, it uses
factorized quadratic forms to model higher-order token interactions. Testing on GPT-2 small + LoRA fine-tuning: Math reasoning: 3.4% PPL improvement Competitive programming: 3.2% Python code: 1.9% The bigger gains on reasoning tasks suggest the approach helps
with complex relationships. Still early stage (only GPT-2 small),
but the results are encouraging. Happy to answer questions! Code + repro steps in the repo. |
|