This is more of an article describing their methodology than a full paper. But yes, there's plenty of peer reviewed papers on this topic, scaling sparse autoencoders to produce interpretable features for large models.
There's a ton of peer reviewed papers on SAEs in the past 2 years; some of them are presented at conferences.