BitcoinTalk

tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

184

Satoshi-only

BitcoinTalk

From:

satoshi

Subject:

tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 15, 2010 at 15:52:09 UTC

0.3.10 has tcatm's 4-way SSE2 as an option switch.

Use the switch "-4way" to turn it on. Without the switch you get Crypto++ ASM SHA-256.

I could only get this working with Linux.

Download:
Get 0.3.10 from http://bitcointalk.org/index.php?topic=827.0

Please report back your CPU and results! I think it's pretty clear that Core 2 and lower are slower, i5 faster. I don't think we've heard any i7 results yet. We need to know about the different models of AMD or other less common CPUs.

BitcoinTalk

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit 0.3.9 rc2

Date:

August 15, 2010 at 18:23:26 UTC

I hope someone can test an i5 or AMD to check that I built it right. I don't have either to test with.

I'm also curious if it performs much worse on 32-bit linux vs 64-bit.

BitcoinTalk

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit 0.3.9 rc2

Date:

August 15, 2010 at 18:43:27 UTC

I just uploaded a quick build so testers can check if I built it right. (I don't have an i5 or AMD) If it checks out, I'll put together the full package and do all the release stuff.

BitcoinTalk

#19

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit 0.3.9 rc2

Date:

August 16, 2010 at 02:57:57 UTC

Quote from: tcatm on August 16, 2010, 12:43:39 AM

I propose to compile sha256.cpp with -O3 -march=amdfamk10 (will work on 32bit and 64bit) as only CPUs supporting this instruction set (AMD Phenom, Intel i5 and newer) benefit from -4way and it'll improve performance by ~9%.

GCC 4.3.3 doesn't support -march=amdfamk10. I get:
sha256.cpp:1: error: bad value (amdfamk10) for -march= switch

Quote from: NewLibertyStandard on August 16, 2010, 01:49:01 AM

With 4way, I get significantly better performance when I have all my virtual cores enabled. I think I get about the same amount of hashes when hyper threading is turned off with or without 4way.

Hey, you may be onto something!

hyperthreading didn't help before because all the work was in the arithmetic and logic units, which the hyperthreads share.

tcatm's SSE2 code must be a mix of normal x86 instructions and SSE2 instructions, so while one is doing x86 code, the other can do SSE2.

How much of an improvement do you get with hyperthreading?

Some numbers? What CPU is that?

BitcoinTalk

#23

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit 0.3.9 rc2

Date:

August 16, 2010 at 03:23:04 UTC

Quote from: Vasiliev on August 16, 2010, 03:17:07 AM

try -march=amdfam10

That works.

That's strange... are we sure that's the same thing? tcatm, try amdfam10 and make sure you get the same speed measurement.

BitcoinTalk

#28

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 16, 2010 at 04:36:59 UTC

Quote from: jgarzik on August 16, 2010, 03:35:28 AM

Code:

cpu family : 6
model : 26
model name : Genuine Intel(R) CPU 000 @ 3.20GHz
stepping : 4

cpu family 6 model 26 stepping 4 is an Intel Core i7.
That's a 23% speedup with -4way, 63% total speedup with -4way + hyperthreading.
33% faster with hyperthreading than without it.

BitcoinTalk

#35

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 16, 2010 at 13:38:01 UTC

I wrapped sha256.cpp in
#ifdef FOURWAYSSE2
#endif // FOURWAYSSE2

try it now.

BitcoinTalk

#44

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 19, 2010 at 19:07:43 UTC

Quote from: Ground Loop on August 18, 2010, 11:14:26 PM

Any non-Mac i5 love?
Windows i5 64-bit got slower here.

That's the first I've heard anyone say i5 was slower. Everyone else has said 4way was faster on i5. Moreso with hyperthreading enabled.

Quote from: nelisky on August 18, 2010, 11:02:25 PM

And i5, at least on my macbookpro

Good, so I take it that's a confirmation that it's working on Mac as well?

Laszlo told me he did compile in the -4way stuff on Mac, so the -4way switch is also available to try on Mac. I don't think makefile.osx on SVN has it yet, just the built version.

BitcoinTalk

#47

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 22, 2010 at 23:21:50 UTC

Thanks for clearing that up. I read the link someone posted about AMD making that change around 2007, but I didn't know what the story was for Intel.

There's no hope for Core/Core2 then. They only have half the SSE2 hardware.

Strange that Intel has 3 128bit units, but AMD with 2 128bit units is the faster one.

BitcoinTalk

#50

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 24, 2010 at 22:43:56 UTC

Quote from: ArtForz on August 21, 2010, 04:56:31 PM

AMD K10: 2 128bit units
intel nehalem: 3 128bit units

This probably explains why hyperthreading increases performance with -4way. If three SSE2 units is excessive, then hyperthreading would help keep them all busy.

BitcoinTalk

#52

From:

satoshi

Subject:

Re: tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

Date:

August 28, 2010 at 14:27:15 UTC

The simplification is intentional. There will only be more than one thash[7]=0 in one out of 134,217,728 cases. It only makes it 0.0000007% slower.