# Principles

Given is an arbitrary integer variable. How to find the index of the least significant bit (LSB) of the first `1`

-bit sequence of length `>= n`

? Assuming `n=4`

, let’s consider the following example of a random 32-bit integer value. The index we’re looking for is 10 in this case (indicated by the `V`

character).

```
31 24 | 23 16 | 15 V 8 | 7 0
MSB 01000111 | 11111101 | 10111100 | 01101001 LSB
```

Using a series of bit-wise *and* and *shift-right* operations the index of the LSB of the first `1111`

sequence in the integer `x`

can be found with the following trick.

```
x &= x>>1;
x &= x>>2;
index = __builtin_ffs(x)-1; // use _BitScanForward in Visual C++
```

After the first statement every `1`

in `x`

indicates the start of a `11`

sequence. After the second statement every `1`

in `x`

indicates the start of a `1111`

sequence. In the last statement the GCC intrinsic `__builtin_ffs()`

(use `_BitScanForward()`

if you’re on Visual C++) returns the bit position of the first set bit, starting from the LSB.

Note, that it doesn’t work to shift by four bits at once because it’s necessary to *combine* neighboring `1`

-bits to make sure that there are no `0`

-bits in-between. The following example illustrates how shifting by 3 bits wrongly yields two isolated `1`

-bits. In contrast, shifting by 2 bits correctly yields a sequence of 2 bits which can be further reduced into a single `1`

-bit indicating the start of the `1111`

sequence.

```
shift by 2 shift by 3
01111010 01111010
& 00011110 & 00001111
= 00011010 = 00001010
ok wrong
```

# Arbitrary sequence lengths

By cleverly choosing the number of bits to shift, it’s even possible to extend this construction to find bit sequences which length is not a power of two. As the order of the *and-shift-right* operations has no relevance, the following algorithm can be used to compute the number of bits to shift in order to find the `n`

-bit sequence index. The sum of shifted bits must be equal to `n-1`

and the number of bits to shift is halved in each iteration. Therefore, the total number of executed iterations is `ceil(log2(n))`

.

```
int FindBitSeqIndexLsb(int x, int n)
{
assert(n >= 0 && n <= 32);
while (n > 1)
{
const int shiftBits = n>>1;
x &= (unsigned)x>>shiftBits; // shift in zeros from left
n -= shiftBits;
}
return __builtin_ffs(x)-1; // use _BitScanForward in Visual C++
}
```

# Exact sequence length

The described method finds bit sequences of length `>= n`

. In case you’re looking for a bit sequence of exactly `n`

bits, the following statement can be inserted right before the LSB scan is performed. This statement masks out any `1`

-bit which has a `1`

on its left or right side. All the remaining `1`

-bits are isolated and indicate the start of a sequence of exactly `n`

bits.

```
mask = (~(x<<1))&(~((unsigned)x>>1)); // shift in zeros from left and right
x &= mask;
```

# Sequence alignment

To account for aligned bit sequences, unaligned `1`

-bits can be simply masked out from `x`

before the LSB scan is performed. For example, to regard only bit sequences starting at *nibble* boundaries `x`

can be modified with the operation `x &= 0x11111111`

(it is `0x...11 = 0b...00010001`

) to clear all bits not starting at an index which is a multiple of four.

March 6, 2014 at 2:02 pm

Hey! That’s really interesting!

I would like to know where in game engine programming bit twiddling & scanning techniques are used, though. Can you give some examples?

Thank you!

March 7, 2014 at 1:36 pm

Bit twiddling / bit scanning techniques are mostly used for optimization reasons, because such code is less readable/maintainable than high-level code. As game engines are performance critical, there is a large number of use cases for such low-level optimizations. However, in some code like data compression algorithms or radix sort already a simple implementation requires binary arithmetic. I personally used it in my engine for the following (this list is by no means complete):

– Data compression algorithms

– Scanning bit-arrays for 1-bits (e.g. in frustum culling)

– Encoding/decoding normal maps

– In SIMD code (aligning data, computing shuffles, …)

– Computing hashes and checksums (CRC, parity, …)

– Radix sort (used to sort renderables by material)

– Mapping materials to radix sort keys

– Render state management

– Any cache optimization in terms of data reduction

– Software implementations of caches

I used the trick described above to implement a least-recently-used (LRU) eviction strategy for the disk block cache of a DB index data-structure. It’s a simple 16-way set-associative cache, where I keep one 64-bit integer per set to store the history of how the ways were recently accessed. To search for a way in the LRU integer I use some bit trickery which results in four consecutive 1-bits, which’s LSB index I obtain efficiently with the tricks described above.