Hacker News new | ask | show | jobs
by jacek-123 62 days ago
Did you try GradNorm or PCGrad, or was manual task weighting good enough? Also curious about the required-vs-preferred head failing. Was that encoder gradient interference from the other tasks, or a capacity issue in the linear head?