"they have to do weird things with extraNonce, which increases the size of the block header".
We need to be vigilant and nip in the bud any misconception that the contents of your block slows down your hash speed. It doesn't.
extraNonce never needs to be very big. We could reset it every second whenever the time changes if we wanted. Worst case, if you didn't want to keep track of incrementing it, extraNonce could be 4 random bytes and the chance of wasting time from collision would be negligible.
Separate machines are automatically collision proof because they have different generated public keys in the first transaction. That also goes for each thread too.