Is it 2x fast on AMD and 1/2 fast on Intel?
Btw. Why are you using this alignup<16> function when __attribute__ ((aligned (16))) will tell the compiler to align at compiletime?
Tried that, but it doesn't work for things on the stack. I ran some tests.
It doesn't even cause an error, it just doesn't align it.