CRITICAL_BLOCK is a macro that contains a for loop. The assertion failure indicates that break has been called inside the body of the loop. The only break statement in this block is in line 2762. In the original source file, there is no break statement in this critical block. I think you must remove lines 2759-2762. The is nothing like that in the original main.cpp.Sorry about that. CRITICAL_BLOCK isn't perfect. You have to be careful not to break or continue out of it. There's an assert that catches and warns about break. I can be criticized for using it, but the syntax would be so much more bloated and error prone without it.
Is there a chance the SSE2 code is slow on Intel because of some quirk that could be worked around? For instance, if something works but is slow if it's not aligned, or thrashing the cache, or one type of instruction that's really slow? I'm not sure how available it is, but I think Intel used to have a profiler for profiling on a per instruction level. I guess if tcatm doesn't have a system with the slow processor to test with, there's not much hope. But it would be really nice if this was working on most CPUs.