Hacker News new | ask | show | jobs
by arugulum 1180 days ago
>Then you'd have to compute the gradients for the whole network

You have to do that with LoRA regardless, to compute the gradients for the lowest-level LoRA weights.