Filed under: Bit Bashing Level 3
Aligning the stack is sure to be a familiar topic for compiler writers and assembly programmers. In x86-64 assembly, functions that contain AVX objects usually align the stack to 32-bytes in the function prologue…
andq -32, %rsp
If you’re not familiar with AVX, it’s a SIMD extension to x86-64. These instructions operate on vectors of 32-bytes. We like to keep those 32-byte vectors aligned to 32-byte offsets in memory for performance reasons. On x86-64, a penalty is paid when a chunk of memory is loaded from an unaligned memory address.
So, how do we align the stack? First, we have to ask ourselves, “Self, is the stack growing up or down?”. The answer will affect the formula we use to calculate the next offset.
Stack grows down
If the stack grows down, we want to find the next multiple of n
smaller than x
.
x & !(n-1)
We will take a real example and work from there. Let’s say that our stack pointer is an 8
-bit integer and the address is 25
. Let’s also say that we want to align that pointer to the next 8
byte boundary. First, we start with the expression in the parentheses. The binary representation of 8-1
is:
00001000 - 00000001 ========== 00000111
And then we NEGATE
the result…
! 00000111 ========== 11111000
The last thing left to do is AND
our stack pointer with the result…
00011001 & 11111000 ========== 00011000
And that’s it! We can see our result is 24
, which is the first multiple of 8
less than 25
(Remember from above? If the stack grows down, we want to find the next multiple of n
smaller than x
). This works by turning off the least significant bits of our pointer, x
, to the right of the 3
rd bit position. We chose the 3
rd bit position since 8 = 2^3
. Let’s look at our example’s source value and the result to convince ourselves that this is correct.
Src: 00011001 Dst: 00011000
See how that works? We zeroed out the least significant bits less than 2^3
.
Stack grows up
If the stack grows up, we want to find the next multiple of n
larger than x
.
(x+n) & !(n-1)
A careful reader will notice that this equation is not very different from our equation for when the stack grows down. To round up, we merely have to add the value of n
to our stack pointer. This will bump up our stack pointer past the next multiple of x
by n%x
. Then, our problem is as simple as performing the same steps that we covered above when the stack grows down.
Align anything in memory
Although we focused solely on stack frames above, this concept can also be extended to align any piece of memory. In fact, this technique is often used to optimize the memory returned by a call to alloca
. To align a block of memory allocated on the stack to a specific boundary, we simply request the amount of memory required for our data +
the desired alignment amount. Then, we can use our Stack grows up
formula above to find the properly aligned memory address to place our data.
The Real World
Another use of this method is used in digital signal processing. It’s called quantization
. In the signal processing domain, we round
the signal’s value up to the next level or truncate
the value down to the next level. Just like a stack!
Quantization is preformed on a wide variety of digital signals. One instance, that may be familiar to Computer Scientists, is MP3 Quantization. Let’s say we wanted to convert a vinyl record into an MP3. A record, of course, produces an analogue signal and MP3 is a digital format. Since our MP3 conversion tools will only sample the signal at discrete intervals, there will be some amount of information loss. This information loss comes in the form of quantization, where we round/truncate our signal to the nearest level. This planned rounding error allows for smaller data encodings, but at the cost of lesser signal quality.