BitcoinTalk
Hash() function not secure

View Satoshi only

External link

Hi,

The Hash() function in util.h forms the backbone of most of bitcoin's crypto:

Code:
template<typename T1>
inline uint256 Hash(const T1 pbegin, const T1 pend)
{
    uint256 hash1;
    SHA256((unsigned char*)&pbegin[0], (pend - pbegin) * sizeof(pbegin[0]), (unsigned char*)&hash1);
    uint256 hash2;
    SHA256((unsigned char*)&hash1, sizeof(hash1), (unsigned char*)&hash2);
    return hash2;
}

As you can see, this tries to be more secure by hashing twice. However, this actually reduces security. To break pure SHA256, an attacker needs to find a d' such that SHA256(d') == SHA256(d), for a known d. This is also sufficient to break Hash(). However the attacker can also attack the outer layer of the hash, finding a d' such that SHA256(SHA256(d')) == SHA256(SHA256(d)), even though SHA256(d') != SHA256(d). As you can see, the double hashing here makes it _easier_ to break the hash!

A better solution would be something like:

Code:
template<typename T1>
inline vector<unsigned char> HashV(const T1 pbegin, const T1 pend)
{
    uint256 sharesult;
    uint160 riperesult;
    SHA256((unsigned char*)&pbegin[0], (pend - pbegin) * sizeof(pbegin[0]), (unsigned char *)&sharesult);
    RIPEMD160((unsigned char*)&pbegin[0], (pend - pbegin) * sizeof(pbegin[0]), (unsigned char *)&riperesult);

    vector<unsigned char> ret;
    ret.insert(ret.end(), (unsigned char *)(&sharesult), (unsigned char *)(&sharesult + 1));
    ret.insert(ret.end(), (unsigned char *)(&riperesult), (unsigned char *)(&riperesult + 1));
    return ret;
}

The key is to concatenate the hashes, not merge them. This means the attacker has to break both hashes - even in the worst case, this cannot be less secure than a single hash.

Unfortunately, changing hashes would break the chain...
Incidentally, a similar problem exists with Hash160, only worse - an attacker can break _either_ RIPEMD160 or SHA256, and win. At least with Hash() the attacker _must_ break SHA256.
The original hash isn't what you're trying to break. You're trying to break the hash bitcoin uses, which is the double hash.
As you can see, this tries to be more secure by hashing twice. However, this actually reduces security. To break pure SHA256, an attacker needs to find a d' such that SHA256(d') == SHA256(d), for a known d. This is also sufficient to break Hash(). However the attacker can also attack the outer layer of the hash, finding a d' such that SHA256(SHA256(d')) == SHA256(SHA256(d)), even though SHA256(d') != SHA256(d). As you can see, the double hashing here makes it _easier_ to break the hash!

If I understand correctly, you've got two chances to find a collision instead of one.

So this decreases the security of SHA256 by a factor of 2... which is just Not a Big Deal.  Bitcoin is using, essentially SHA255 instead of SHA256.  It'll still take longer than forever to find a collision...
As you can see, this tries to be more secure by hashing twice. However, this actually reduces security. To break pure SHA256, an attacker needs to find a d' such that SHA256(d') == SHA256(d), for a known d. This is also sufficient to break Hash(). However the attacker can also attack the outer layer of the hash, finding a d' such that SHA256(SHA256(d')) == SHA256(SHA256(d)), even though SHA256(d') != SHA256(d). As you can see, the double hashing here makes it _easier_ to break the hash!

If I understand correctly, you've got two chances to find a collision instead of one.

So this decreases the security of SHA256 by a factor of 2... which is just Not a Big Deal.  Bitcoin is using, essentially SHA255 instead of SHA256.  It'll still take longer than forever to find a collision...

It's true that it's not a big deal, but what concerns me is that someone went out of their way to implement an unproven hashing method, without thinking it through to realize that it actually decreases security. Things like this reduce one's confidence in the system as a whole - after all, if such a mistake exists, what other problems may be lurking?

What we really need is a documented protocol, so it can be audited by third parties. I'm working on reverse-engineering it from the source, but it's slow going - the intent of much of the scripting system is poorly documented.
You're trying to solve:
SHA256(SHA256(d'))==256-bit number

This still has 256 bits of security, and you have to do two hashes per attempt to brute-force it.
You're trying to solve:
SHA256(SHA256(d'))==256-bit number

This still has 256 bits of security, and you have to do two hashes per attempt to brute-force it.
I agree, basically I get what he is saying at the top, but it isn't as bad as everyone thinks.

To brute force your 256bit hash, you either guess the starting string or the second hash string. Which means you have to guess, convert to 256, convert to 256 again and see if that matches. If you attack the second hash, you can't start a guess at 0000000.....1 for example, you know it's already another 256bit number that have to start with. That's like trying to brute force a password and knowing ahead of time the guy's password was 100 digits long. Does it make it any easier? Well no because the first 99 digits could be 0 and the last 1 but either way you still have to try all in between.

Mathematically, I see no advantage. Either you guess the first string to generate the hash or guess the first hash that generates the second hash. Either way you are going to have to brute force the entire key-space to do this. The only advantage is that instead of having "one" correct answer, you have "two" correct answers out of the billion trillion million guesses possible.  Grin
There's a stackoverflow question about double hashing, by the way.  Consensus is that it's not less secure.  It's too late for my brain to process the nuances of hashing...
Could this be fixed in the next release? We were just discussing what would happen if a flaw were found: Here we go, our first real test of the situation.  Smiley
Could this be fixed in the next release? We were just discussing what would happen if a flaw were found: Here we go, our first real test of the situation.  Smiley
Nope! At least, not easily - block and transaction identities are based on this hash, among other things, so this would break the chain. You'd need to do a phased rollout - first, add parsing support to all clients, then obsolete older clients, then roll out a new version which generates a new version of the block serialization format using the new hashing method.
I have no doubt that hashing multiple times is secure, since it is specified in PKCS #5 and other standards:
Quote
An iteration count has traditionally served the purpose of increasing the cost of producing keys from a password, thereby also increasing the difficulty of attack. For the methods in this document, a minimum of 1000 iterations is recommended. This will increase the cost of exhaustive search for passwords significantly, without a noticeable impact in the cost of deriving individual keys.
Could this be fixed in the next release? We were just discussing what would happen if a flaw were found: Here we go, our first real test of the situation.  Smiley
Nope! At least, not easily - block and transaction identities are based on this hash, among other things, so this would break the chain. You'd need to do a phased rollout - first, add parsing support to all clients, then obsolete older clients, then roll out a new version which generates a new version of the block serialization format using the new hashing method.

All the better to practice now, while the network is still young! Well, if it is not a real flaw, it's not necessary, but it would be good practice. This could always be done first in the test network.
I have no doubt that hashing multiple times is secure, since it is specified in PKCS #5 and other standards:
Quote
An iteration count has traditionally served the purpose of increasing the cost of producing keys from a password, thereby also increasing the difficulty of attack. For the methods in this document, a minimum of 1000 iterations is recommended. This will increase the cost of exhaustive search for passwords significantly, without a noticeable impact in the cost of deriving individual keys.

That quote is about using hashes to transform passwords, which is a different application to what Bitcoin is using SHA-256 for. I don't think the security issues in the two cases are analogous.
PKCS #5 is used by TrueCrypt (and GPG, I think). If each iteration reduced the difficulty of finding collisions, as bdonlan suggests, then TrueCrypt would be easy to break. You'd just attack the hashing algorithm to find the master key.
There's a stackoverflow question about double hashing, by the way.  Consensus is that it's not less secure.  It's too late for my brain to process the nuances of hashing...

Note that that discussion is referring to hashing passwords - and the most upvoted comment is explicitly discussing such with that in mind. Trying to do a dictionary attack has no coherent meaning, and someone directly attacking the algorithms behind bitcoin would obviously not start there. It will attack the algorithm.
What is being hashed in this function?  <-- let's call this item x

If the total number of possible x is smaller than 256bit space that sha256 provides than an attack would come in the form of a table of all sha256(x) values.

If all of the possible combination's of x is greater than the 256bit space, then here is some math for you (the attack would be at the hash value itself rather than a table on known sha256(x)):

What follows is from www.atmel.com/dyn/resources/prod_documents/doc8668.pdf

If there are 256 bits in the key, then after 2^255 attempts the attacker has a 50% chance of finding the right key and after 2^256 attempts he has tried all possible keys and is guaranteed to have found the key.

Here are some estimates of big numbers:

2^66 Number of grains of sand on the earth
2^76 Number of stars in the universe
2^79 Avogadro's number. The number of carbon atoms in 12 grams of coal.
2^96 Number of atoms in a cubic meter of water
2^190 Number of atoms in the sun
2^255 Number of attempts to find the key value from above

But what about very well funded entities such as the US National Security Agency (NSA)? Could they build a machine to crack a 256 bit key? Assume they could build a theoretical nanocomputer that executes 10^13 instructions per second (approximate rate of atomic vibrations) in a space of a cube with a side that is 5.43nm across (This is the approximate size of a silicon lattice10 atoms wide, or a crystal containing 1000 silicon atoms). Assume that it could calculate an attempt in 10 cycles. Such a computer the size of the earth would take more than 10^13 years (roughly 58 times the estimated age of the earth) to attack a 256 bit algorithm via brute force.
I want some nanocomputers!
What is being hashed in this function?  <-- let's call this item x

If the total number of possible x is smaller than 256bit space that sha256 provides than an attack would come in the form of a table of all sha256(x) values.

If all of the possible combination's of x is greater than the 256bit space, then here is some math for you (the attack would be at the hash value itself rather than a table on known sha256(x)):

What follows is from www.atmel.com/dyn/resources/prod_documents/doc8668.pdf

If there are 256 bits in the key, then after 2^255 attempts the attacker has a 50% chance of finding the right key and after 2^256 attempts he has tried all possible keys and is guaranteed to have found the key.

Here are some estimates of big numbers:

2^66 Number of grains of sand on the earth
2^76 Number of stars in the universe
2^79 Avogadro's number. The number of carbon atoms in 12 grams of coal.
2^96 Number of atoms in a cubic meter of water
2^190 Number of atoms in the sun
2^255 Number of attempts to find the key value from above

But what about very well funded entities such as the US National Security Agency (NSA)? Could they build a machine to crack a 256 bit key? Assume they could build a theoretical nanocomputer that executes 10^13 instructions per second (approximate rate of atomic vibrations) in a space of a cube with a side that is 5.43nm across (This is the approximate size of a silicon lattice10 atoms wide, or a crystal containing 1000 silicon atoms). Assume that it could calculate an attempt in 10 cycles. Such a computer the size of the earth would take more than 10^13 years (roughly 58 times the estimated age of the earth) to attack a 256 bit algorithm via brute force.

Quantum computing and encryption: http://stackoverflow.com/questions/2768807/quantum-computing-and-encryption-breaking
If there are 256 bits in the key, then after 2^255 attempts the attacker has a 50% chance of finding the right key and after 2^256 attempts he has tried all possible keys and is guaranteed to have found the key.

That's true, but only when you brute-force every singe hash.
There are other know attacking methods for SHA algorithms, thats why it is planned to replace the SHA-2 family with SHA-3 soon.

German source:
Quote
Als Reaktion auf die bekanntgewordenen Angriffe gegen SHA-1 hielt das National Institute of Standards and Technology (NIST) im Oktober 2005 einen Workshop ab, in dem der aktuelle Stand kryptologischer Hashfunktionen diskutiert wurde. NIST empfiehlt den \u00dcbergang von SHA-1 zu Hashfunktionen der SHA-2-Familie (SHA-224, SHA-256, SHA-384, SHA-512). Langfristig sollen diese durch einen neuen Standard SHA-3 ersetzt werden. Dazu ruft das NIST nach Vorbild des Advanced Encryption Standard (AES) zu einem Ausschreiben auf. Die endg\u00fcltige Wahl und Benennung des Hash-Standards ist f\u00fcr 2012 vorgesehen.
http://de.wikipedia.org/wiki/Secure_Hash_Algorithm#Empfehlungen
...

2^255 Number of attempts to find the key value from above

1) Your equations don't take into account reversible computing advancements at all. These techniques could halve the search space by being able to take advantage of algorithms that traditional computers find impossible.

2) Your equations assume the algorithm is perfectly secure. Given that NIST is rather gung ho about developing SHA-3, I'm not terribly confident in that analysis.

3) An attacker does not need to find a specific key to compromise the system, they just need to find the keys to someone's wallet. If you want this to be a viable currency for Earth's population, plus corporate accounts, you drop the search space by 10^10 to 10^12.

That 2^256 search space could get whittled down to the 2^110 range by the end of the century. Assuming no major attacks on its integrity exist.

I'm not particularly sold on the technical soundness of this program, honestly. Why use SHA256 rather than Whirlpool or SHA512?
I'm not particularly sold on the technical soundness of this program, honestly. Why use SHA256 rather than Whirlpool or SHA512?
For the same reason they didn't use SHA1024 or SHA2048 or SHA4048 or SHA1000000000000000000000000000000000

There are lots of theoretical attacks that can be done against it, but if a program or new math proof can half the amount of time it takes to crack it, are we really worried about the encryption taking 100 billion years to crack but now with this new attack (insert math,attack,flaw) it's only going to take only 1 billion years to crack? How about a million years? Even one-hundred thousand years?

When they can crack SHA256 in under 10 minutes, I'll be worried, but until then, time is on your side.
For the same reason they didn't use SHA1024 or SHA2048 or SHA4048 or SHA1000000000000000000000000000000000

No. SHA512 and Whirlpool exist, are well defined, well supported, well analyzed, and they exist for a reason.

Quote
There are lots of theoretical attacks that can be done against it, but if a program or new math proof can half the amount of time it takes to crack it,

Reversible computing techniques 'cheat' around the entropy limit. This means they can reach effective speeds far, far beyond what are possible with current computers, as they are effectively capable of performing nondeterministic operations.

You are basically betting the entire economy (if you believe bitcoins will succeed anyway) on no one developing a means to halve the effective bit length as has been done with e.g. AES.

It's careless.

Quote
are we really worried about the encryption taking 100 billion years to crack but now with this new attack (insert math,attack,flaw) it's only going to take only 1 billion years to crack? How about a million years? Even one-hundred thousand years?

Ten years, assuming only minor flaws in SHA256.

If there is a major flaw (again, see the push for SHA-3) there is a much more serious problem. There does not appear to be a clear mechanism for handling a compromise of the basic algorithm, and there should be.
For the same reason they didn't use SHA1024 or SHA2048 or SHA4048 or SHA1000000000000000000000000000000000

No. SHA512 and Whirlpool exist, are well defined, well supported, well analyzed, and they exist for a reason.
I'm sure the same was said back when 40, 64, 128, and 256 bit encryption was coming out. SHA512 is part of SHA2, I remember when everyone was talking about how insecure SHA1 was with that significant *flaw* and how we all need to move to SHA2 because it was well defined, well supported, and well analyzed; sound familiar?
Quote
Quote
There are lots of theoretical attacks that can be done against it, but if a program or new math proof can half the amount of time it takes to crack it,

Reversible computing techniques 'cheat' around the entropy limit. This means they can reach effective speeds far, far beyond what are possible with current computers, as they are effectively capable of performing nondeterministic operations.

You are basically betting the entire economy (if you believe bitcoins will succeed anyway) on no one developing a means to halve the effective bit length as has been done with e.g. AES.

It's careless.
I'm not betting anything for the simple fact that unless you can see into the future, we work with what we have in front of us. We can debate all day that in the future computers will be a million times faster or that some math genius is going to discover a flaw in the system that would bring everything down. I'm well aware of many peer reviewed papers and tech journals, even blogs about encryption. Not everything in existence, as I don't have the time for all of it, but enough practical experience to be able to visualize what it would really take to do what you propose.
Quote
Quote
are we really worried about the encryption taking 100 billion years to crack but now with this new attack (insert math,attack,flaw) it's only going to take only 1 billion years to crack? How about a million years? Even one-hundred thousand years?

Ten years, assuming only minor flaws in SHA256.

If there is a major flaw (again, see the push for SHA-3) there is a much more serious problem. There does not appear to be a clear mechanism for handling a compromise of the basic algorithm, and there should be.
10 years is a good a guess as my 1 million years. SHA1 still has not been broken, but you can brute force/exploit flaws on a super-computer in under 60 hours if you have $35 million to throw at it.

Overall, I hear what you say and if BitCoin jumped in the SHA3 realm, I would sleep just as well at night as I did with it using the older SHA2 realm of technology.

I'm not trying to nitpick your post, just offering up my opinion and I certainly respect yours. I think we can both agree that if the encryption is bumped up another notch in the future, it would be a good thing for the system and community as a whole.
I'm sure the same was said back when 40, 64, 128, and 256 bit encryption was coming out. SHA512 is part of SHA2, I remember when everyone was talking about how insecure SHA1 was with that significant *flaw* and how we all need to move to SHA2 because it was well defined, well supported, and well analyzed; sound familiar?

I'm not aware of any 40 bit hash length. The reason for SHA512 and Whirlpool's excessive bit lengths are just for such future proofing. A collision has a 50% chance of occurring at half of the bit length (roughly), so weak hashes such as MD5 would have problems at around 2^64 elements, this was well known and few people thought this was impossible to achieve in the future.

So no, it's not exactly familiar. Wanting to double it again because the idea is to form a basis for a new economy is not crazy talk - there is no technical reason not to use Whirlpool or SHA512 (or both). The only hurt would be a lower hash generation rate... which isn't exactly a problem.

Quote
I'm not betting anything for the simple fact that unless you can see into the future, we work with what we have in front of us. We can debate all day that in the future computers will be a million times faster or that some math genius is going to discover a flaw in the system that would bring everything down. I'm well aware of many peer reviewed papers and tech journals, even blogs about encryption. Not everything in existence, as I don't have the time for all of it, but enough practical experience to be able to visualize what it would really take to do what you propose.

Your participation is your bet.

Quote
10 years is a good a guess as my 1 million years. SHA1 still has not been broken, but you can brute force/exploit flaws on a super-computer in under 60 hours if you have $35 million to throw at it.

$35 million to swipe any single person's bank account.

Quote
I'm not trying to nitpick your post, just offering up my opinion and I certainly respect yours. I think we can both agree that if the encryption is bumped up another notch in the future, it would be a good thing for the system and community as a whole.

I'm honestly rather keen on exploring alternative possibilities for generation. As one Slashdotter put it, the current scheme is like claiming a forest of unknown size as currency, burning it down, then proclaiming the most select 21 tons of ash as the representative medium. I have an idea but I have no clue about who would be interested.
That's why I'm glad we have people like you here to participate in the discussion as well. A room full of "yes men" is a dangerous thing.  Grin
Reversible computing techniques 'cheat' around the entropy limit. This means they can reach effective speeds far, far beyond what are possible with current computers, as they are effectively capable of performing nondeterministic operations.

Wait, what? I thought reversible computation just uses less energy. Where does the non-determinism come in?

Anyway, about the hashing being insecure: Wikipedia says that attacks on SHA-256 still take on the order of 2250 operations. And unless I made a big thinko here, doesn't the hash target change every ~10 minutes? Wouldn't that throw of an attacker? And if it was possible to break SHA faster, wouldn't the system adjust by raising the difficulty level?
Wait, what? I thought reversible computation just uses less energy. Where does the non-determinism come in?

It's the theoretical limit of how far you can go with it. When doing a brute force search, you can reverse back to any previous state and try new states.

Quote
Anyway, about the hashing being insecure: Wikipedia says that attacks on SHA-256 still take on the order of 2250 operations. And unless I made a big thinko here, doesn't the hash target change every ~10 minutes? Wouldn't that throw of an attacker? And if it was possible to break SHA faster, wouldn't the system adjust by raising the difficulty level?

It's doubtful that SHA2 would be that broken before all coins have been minted. The issue is it then becomes a question of how much it costs to hijack someone's bank account.
SHA256 is not like the step from 128 bit to 160 bit.

To use an analogy, it's more like the step from 32-bit to 64-bit address space.  We quickly ran out of address space with 16-bit computers, we ran out of address space with 32-bit computers at 4GB, that doesn't mean we're going to run out again with 64-bit anytime soon.

SHA256 is not going to be broken by Moore's law computational improvements in our lifetimes.  If it's going to get broken, it'll be by some breakthrough cracking method.  An attack that could so thoroughly vanquish SHA256 to bring it within computationally tractable range has a good chance of clobbering SHA512 too.

If we see a weakness in SHA256 coming gradually, we can transition to a new hash function after a certain block number.  Everyone would have to upgrade their software by that block number.  The new software would keep a new hash of all the old blocks to make sure they're not replaced with another block with the same old hash.