Hacker News new | ask | show | jobs
by mk_stjames 86 days ago
In case anyone wanted technical details of the NN, I dug into the repo:

Its a transformer, with a CNN refiner after. Specifically, a ViT using the Hiera architecture (https://github.com/facebookresearch/hiera)

The Hiera ViT has dual decoder heads, one for the alpha and one for the RGD foreground, and then a small CNN refiner network to solve some artifacting in the output from the Hiera model.

I'd be very interested to see a long form tech talk of Niko explaining his process of learning ML ropes and building this model.