|
|
|
|
|
by wwwtyro
605 days ago
|
|
Can anyone help me understand how this works without special bitnet precision-specific hardware? Is special hardware unnecessary? Maybe it just doesn't reach the full bitnet potential without it? Or maybe it does, with some fancy tricks? Thanks! |
|
The easiest example is xor, which can trivially be interpreted as either xoring one large integer or xoring a vector of smaller integers.
Take a look at the SWAR example here [0] as a pretty common/easy example of that technique being good for something in the real world.
Dedicated hardware is almost always better, but you can still get major improvements with a little elbow grease.
[0] https://nimrod.blog/posts/algorithms-behind-popcount/