BitcoinTalk

Bitcoin x86 for Windows

Re: Bitcoin x86 for Windows

Crypto++ 5.6.0: http://www.cryptopp.com/
Cached SHA256: http://pastebin.com/rJAYZJ32 (although I'm pretty sure this is publically submitted elsewhere, I was linked to it on IRC)
I added the cached SHA256 state idea to the SVN, rev 113.  The speedup is about 70%.  I credited it to tcatm based on your post in the x64 thread.

I can compile the Crypto++ 5.6.0 ASM SHA code with MinGW but as soon as it runs it crashes.  It says its for MASM (Microsoft's assembler) and the sample command line they give looks like Visual C++.  Does it only work with the MSVC and Intel compilers?

Re: Bitcoin x86 for Windows

I was able to integrate the SHA256 functionality from Crypto++ 5.6.0 into Bitcoin.  This is the fastest SHA256 yet using the SSE2 assembly code.  Since Bitcoin was sending unaligned data to the block hash function, I had to change the MOVDQA instruction to MOVDQU.

I think using the SHA256 functionality from Crypto++ 5.6.0 is the way forward right now.
I added a subset of the Crypto++ 5.6.0 library to the SVN.  I stripped it down to just SHA and 11 general dependency files.  There shouldn't be any other crypto in there other than SHA.

I aligned the data fields and it worked.  The ASM SHA-256 is about 48% faster.  The combined speedup is about 2.5x faster than version 0.3.3.

I guess it's using SSE2.  It automatically sets its build configuration at compile time based on the compiler environment.

It looks like it has some SSE2 detection at runtime, but it's hard to tell if it actually uses it to fall back if it's not available.  I want the release builds to have SSE2.  SSE2 has been around since the first Pentium 4.  A Pentium 3 or older would be so slow, you'd be wasting your electricity trying to generate on it anyway.

This is SVN rev 114.

Re: Bitcoin x86 for Windows

OK, thanks.  I'd also like to know if it runs fine as long as you don't turn on Generate.  You'd think as long as it doesn't actually execute any SSE2 instructions, it would still load.  At least Pentium 3's could run it without generating.