If I had to steel-man the idea, I'm pretty sure the integer-based solution has better codegen with many kinds of sparse, comptime-known masks. I think you're right though, StaticBitSet looks better.
For your specific case, even a simple `[9][9]u16` might perform better (where you make use of nine bits in each u16). For each entry, the nine mask bits would be in the same bit positions, so the compiler won't have to do a bunch of shifts to extract/align the bits. CPUs love consistency. I doubt it's worth the additional codegen complexity to save 70 bytes in your data model.