Principles
Given is an arbitrary integer variable. How to find the index of the least significant bit (LSB) of the first 1
-bit sequence of length >= n
? Assuming n=4
, let’s consider the following example of a random 32-bit integer value. The index we’re looking for is 10 in this case (indicated by the V
character).
31 24 | 23 16 | 15 V 8 | 7 0
MSB 01000111 | 11111101 | 10111100 | 01101001 LSB
Using a series of bit-wise and and shift-right operations the index of the LSB of the first 1111
sequence in the integer x
can be found with the following trick.
x &= x>>1;
x &= x>>2;
index = __builtin_ffs(x)-1; // use _BitScanForward in Visual C++
After the first statement every 1
in x
indicates the start of a 11
sequence. After the second statement every 1
in x
indicates the start of a 1111
sequence. In the last statement the GCC intrinsic __builtin_ffs()
(use _BitScanForward()
if you’re on Visual C++) returns the bit position of the first set bit, starting from the LSB.
Note, that it doesn’t work to shift by four bits at once because it’s necessary to combine neighboring 1
-bits to make sure that there are no 0
-bits in-between. The following example illustrates how shifting by 3 bits wrongly yields two isolated 1
-bits. In contrast, shifting by 2 bits correctly yields a sequence of 2 bits which can be further reduced into a single 1
-bit indicating the start of the 1111
sequence.
shift by 2 shift by 3
01111010 01111010
& 00011110 & 00001111
= 00011010 = 00001010
ok wrong
Arbitrary sequence lengths
By cleverly choosing the number of bits to shift, it’s even possible to extend this construction to find bit sequences which length is not a power of two. As the order of the and-shift-right operations has no relevance, the following algorithm can be used to compute the number of bits to shift in order to find the n
-bit sequence index. The sum of shifted bits must be equal to n-1
and the number of bits to shift is halved in each iteration. Therefore, the total number of executed iterations is ceil(log2(n))
.
int FindBitSeqIndexLsb(int x, int n)
{
assert(n >= 0 && n <= 32);
while (n > 1)
{
const int shiftBits = n>>1;
x &= (unsigned)x>>shiftBits; // shift in zeros from left
n -= shiftBits;
}
return __builtin_ffs(x)-1; // use _BitScanForward in Visual C++
}
Exact sequence length
The described method finds bit sequences of length >= n
. In case you’re looking for a bit sequence of exactly n
bits, the following statement can be inserted right before the LSB scan is performed. This statement masks out any 1
-bit which has a 1
on its left or right side. All the remaining 1
-bits are isolated and indicate the start of a sequence of exactly n
bits.
mask = (~(x<<1))&(~((unsigned)x>>1)); // shift in zeros from left and right
x &= mask;
Sequence alignment
To account for aligned bit sequences, unaligned 1
-bits can be simply masked out from x
before the LSB scan is performed. For example, to regard only bit sequences starting at nibble boundaries x
can be modified with the operation x &= 0x11111111
(it is 0x...11 = 0b...00010001
) to clear all bits not starting at an index which is a multiple of four.
March 6, 2014 at 2:02 pm
Hey! That’s really interesting!
I would like to know where in game engine programming bit twiddling & scanning techniques are used, though. Can you give some examples?
Thank you!
March 7, 2014 at 1:36 pm
Bit twiddling / bit scanning techniques are mostly used for optimization reasons, because such code is less readable/maintainable than high-level code. As game engines are performance critical, there is a large number of use cases for such low-level optimizations. However, in some code like data compression algorithms or radix sort already a simple implementation requires binary arithmetic. I personally used it in my engine for the following (this list is by no means complete):
– Data compression algorithms
– Scanning bit-arrays for 1-bits (e.g. in frustum culling)
– Encoding/decoding normal maps
– In SIMD code (aligning data, computing shuffles, …)
– Computing hashes and checksums (CRC, parity, …)
– Radix sort (used to sort renderables by material)
– Mapping materials to radix sort keys
– Render state management
– Any cache optimization in terms of data reduction
– Software implementations of caches
I used the trick described above to implement a least-recently-used (LRU) eviction strategy for the disk block cache of a DB index data-structure. It’s a simple 16-way set-associative cache, where I keep one 64-bit integer per set to store the history of how the ways were recently accessed. To search for a way in the LRU integer I use some bit trickery which results in four consecutive 1-bits, which’s LSB index I obtain efficiently with the tricks described above.