Hacker News new | ask | show | jobs
by fjkdlsjflkds 720 days ago
OP does not (seemingly) claim that tinygrad can't compute hessians, only that a first-order optimization method was the only thing tried.

Also, as a quasi-newton method, L-BFGS does not require explicit (pre-)computation of the hessian (it implicitly iteratively estimates its inverse in an online manner).