The program bypasses the need to actually check the particular number against 3 and 5 using any kind of arithmetic instructions, instead it's keeping separate values for 3-count and 5-count which will be either 3 or 5 when the number is a multiple of either. It then performs a clear on it. So it has this state when it gets to the checks:
n 3-count 5-count
1 1 1
2 2 2
3 3 3
4 1 4
...
So the check is just a cpi (compare immediate) instruction for whether the particular counter is 3 or 5. Which is probably faster than bit twiddling, especially since they don't have a pop count instruction available in the instruction set. On an instruction set with that, I imagine it would be a useful trick though.
The point of bit-twiddling is to produce a number n that is 1 or -1 if counter is divisible by 3 or 5 and 0 otherwise, and then add either mul(n, counter) or and(n, counter) to the result. It avoids branches which could shave off a few bytes depending on how branch instructions are encoded. Since we're golfing, we're not interested in speed, only program size.
Still not an option on the platform in the article, though. You'd have to make your own pop count and implement it likely with a loop and bit shifting. It's almost certainly going to take more instructions than this and end up with as many branches as a result. The instruction set is very limited. Though quoting myself:
>> On an instruction set with that [pop count], I imagine it would be a useful trick though.