BitcoinTalk

RFC: ship block chain 1-74000 with release tarballs?

BitcoinTalk
#1
From:
jgarzik
Subject:
RFC: ship block chain 1-74000 with release tarballs?
Date:

It appears that blk0001.dat, where bitcoin stores block chain information, is compatible across Windows, Linux, 32-bit and 64-bit.

Therefore, why not save new users some time by shipping blocks 1-74000 with each release?

Presumably, indexing and verifying a local file would be faster, and use fewer network resources, than downloading all those blocks via P2P.
BitcoinTalk
#2
From:
wumpus
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source?
(also the reason why some gaming companies use bittorrent to distribute updates)
BitcoinTalk
#3
From:
RHorning
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
I have mixed feelings about this.  Part of the problem is that there is perceived to be a free good, a network hosting service in the form of Source Forge, which will certainly allow those performing software releases to include considerably more data than is currently the case for Bitcoins.  If somehow we were paying for this service as a community in terms of $$$ per MiB, I think it would be a no brainer that this should stay out of the distribution.  Unfortunately for this consideration, it is a free good from the perspective of most users.

The other issue is that the network bandwidth between nodes is also a free good.  I've suggested in this thread that perhaps the presumption of network bandwidth may also not be considered a free good either.  In fact, I believe that it shouldn't be the case, but that is a completely separate issue entirely.

The network bandwidth for downloading the blocks is to me a wash either way, although a new client coming "on-line" trying to get the full block chain does suck up a whole bunch of blocks through the Bitcoin network and that impacts anybody who happens to be connected to those nodes.  BTW, this is one of the reasons I think it would be incredibly useful to start "charging" for bandwidth as a means to discourage this behavior... and of course to earn a few extra Bitcoins on the side.  If you can obtain blocks "free" from another source, some people might get more creative on how to get that accomplished including downloading a second package on some free file hosting service (perhaps included with the main client distributions) or coming up with a scheme on how to bootstrap new clients that impacts the network in a less obtrusive fashion.

I guess what I'm saying is that while this is a simple solution to a complex problem, it doesn't solve all of the problems including perhaps clients which may store the block data in another format.  There also isn't any apparent reason to necessarily encourage other software client distros to include this kind of data or for that matter to put in more than the most minimum number of blocks.  Still, raising the issue is useful here and I hope it raises a discussion about the problem.

Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source?
(also the reason why some gaming companies use bittorrent to distribute updates)


I agree it seems very odd that you would take something which is by its nature distributed through P2P channels and instead put it into a conventional client-server distribution model.  Part of why I'm saying that perhaps more thought ought to go into this is perhaps to encourage a bittorrent distribution connection of some sort for a large collection of blocks if somebody has had their client off for awhile or some other kind of experimentation on how to solve this same problem.  The problem is that new clients are demanding the whole block chain and really can't get into "mining" or confirming new transactions until they have that chain.  Let's solve that problem, which is a larger issue.

The other issue is that it seems like a waste of bandwidth to include these blocks in a client when all you are doing is updating the software.  I would be just as worried that the block chain might get wiped out by the installation software with this "older" version of the chain, forcing older clients to update to the current block all over again, although this is certainly an installation bug.  Just because it is a free good doesn't imply there are no other consequences to going this route.
BitcoinTalk
#4
From:
wumpus
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Indeed, shipping the data with the client is just a kludge.

If the main reason that the bitcoin P2P protocol is so inefficient in transferring large amounts of blocks, that should be fixed. I think that's because of the HDD syncing going on. Maybe this should be held off for the initial download, or the protocol should be made more bittorrent-like for the blocks [0..last-10000], as they are basically set in stone.
BitcoinTalk
#5
From:
satoshi
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
It's not the downloading that takes the time, it's verifying and indexing it.

Bandwidthwise, it's more efficient than if you downloaded an archive.  Bitcoin only downloads the data in blk0001.dat, which is currently 55MB, and builds blkindex.dat itself, which is 47MB.  Building blkindex.dat is what causes all the disk activity.

During the block download, it only flushes the database to disk every 500 blocks.  You may see the block count pause at ??499 and ??999.  That's when it's flushing.

Doing your own verifying and indexing is the only way to be sure your index data is secure.  If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.

Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
BitcoinTalk
#6
From:
MrFlibble
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
My first reaction was "+1 for fast setup", but most of the 24hr delay I suffered was local disc.  Disabling fsync (?) on the database while in catch-up mode would help the most.

Huh, isn't P2P supposed to be faster because you can download from many users at once instead of one source?
(also the reason why some gaming companies use bittorrent to distribute updates)

Good point.  But since the sha-256 of the block is wired into the code, it is perfectly reasonable to ship the data too.  When the blockchain is over 500meg, I think transfer efficiency will become important.

We have options,

  • ship blockchain from SF until it's not politely within their AUP, then re-evaluate.  I couldn't find a file size limit, even for the project website service (only a quick surf of their docs).
  • ship 'small' binaries from SF, and 'large' releases with data via BitTorrent
  • ship 'small' release, including the .torrent for the blockchain and a fetcher script. This looks for one of three popular command line BitTorrent clients for the platform and uses that to fetch the chain, or whinge if it can't.

http://sourceforge.net/apps/trac/sourceforge/wiki/Developer%20web says
Quote
Note: All file releases should be a single file. Multiple files for the same release should be archived together (tar, deb, zip, etc.). We recommend using rsync for all uploads over 20 megabytes in size, as rsync allows for resuming canceled or interrupted transfers.

Hmm, shipping the blockchain for each binary arch would be perverse.


Then, who provides the tracker & seed for the data?  Someone with incentive or community spirit?  Well, this forum+wiki seem to live on http://www.slicehost.com/ => min $20/month.  It could probably share without hurting the website, and (I think) the seed could be severely throttled to make other BT seeds pull more weight.
BitcoinTalk
#7
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
It's not the downloading that takes the time, it's verifying and indexing it.

This is not true of many novice users, who say things like "well it took several hours to catch all the 90 000 blocks but finally it arrived" (quoted from one new user, on IRC, today).

Quote
Bandwidthwise, it's more efficient than if you downloaded an archive.

Agreed.  Compressed in an archive, blk0001.dat is around 36MB.

Quote
Bitcoin only downloads the data in blk0001.dat, which is currently 55MB, and builds blkindex.dat itself, which is 47MB.  Building blkindex.dat is what causes all the disk activity.

During the block download, it only flushes the database to disk every 500 blocks.  You may see the block count pause at ??499 and ??999.  That's when it's flushing.

It remains the download, not the verification, that has the highest variability of experience, where first time users see a delay of 30 minutes to several hours before the software is actually usable.  Some P2P nodes may be extremely slow (I see high variability in latency and throughput for old blocks, and blocks larger than 512 bytes).  End user bandwidth may be low, spotty or expensive.  Firewalls are often a problem.

I'm betting that the above complaint from a new user was due to a Microsoft firewall; but the point stands:  large variance of network configuration and capability implies the P2P download impact may be far, far greater than impact of on-disk verification of 90,000 blocks.

Quote
Doing your own verifying and indexing is the only way to be sure your index data is secure.  If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.

Who said untrusted?  The proposal is that you distribute blk0001.dat (and only blk0001.dat) in the bitcoin.org official client downloads.  And of course the client will spend some time verifying blk0001.dat upon first use.  This is unavoidable, and nobody has proposed changing or eliminating verification.

Just shipping blk0001.dat with official bitcoin would eliminate several headaches that new bitcoin users continue to experience.
BitcoinTalk
#8
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.

Which of the ACID properties do you need, while downloading?

Adding BDB records is simply appending to a log file, until you issue a checkpoint.  The checkpoint then updates the main database file.

Under a normal BDB transaction, you are guaranteed that each log record will be sync'd to disk platter, before the transaction commit succeeds. This is very strict, but required for full ACID. Enabling DB_TXN_NOSYNC still gives you a lot:

     "database integrity will be maintained, but if the application or system fails, it is possible
      some number of the most recently committed transactions may be undone during recovery"

bitcoin can obviously recover if recent transactions are undone, so, it seems useful for this flag to be set for 100% of the initial block download.

That leaves checkpointing, which is a balance between amount of work performed at checkpoint time -- number of records that must be copied from log to database file -- and wall clock time.  Just gotta try some values and see what "feels" right -- maybe checkpoint every 10,000 blocks?
BitcoinTalk
#9
From:
satoshi
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
I tested it on a slow 7 year old drive, where bandwidth and CPU were clearly not the bottleneck.  Initial download took 1 hour 20 minutes.

If it's taking a lot longer than that, certainly 24 hours, then it must be downloading from a very slow node, or your connection is much slower than around 15KB per sec (120kbps), or something else is wrong.  It would be nice to know what appears to be the bottleneck when that happens.

Every 10 minutes or so when the latest block is sent, it should have the chance to change to a faster node.  When the latest block is broadcast, it requests the next 500 blocks from other nodes, and continues the download from the one that sends it fastest.  At least, that's how it should work.

Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
Which of the ACID properties do you need, while downloading?
It may only need more read caching.  It has to read randomly all over blk0001.dat and blkindex.dat to index.  It can't assume the file is smaller than memory, although it currently still is.  Caching would be effective, since most dependencies are recent.

Someone should experiment with different Berkeley DB settings and see if there's something that makes the download substantially faster.  If something substantial is discovered, then we can work out the particulars.

Quote
Adding BDB records is simply appending to a log file, until you issue a checkpoint.  The checkpoint then updates the main database file.
We checkpoint every 500 blocks.
BitcoinTalk
#10
From:
RHorning
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Who said untrusted?  The proposal is that you distribute blk0001.dat (and only blk0001.dat) in the bitcoin.org official client downloads.  And of course the client will spend some time verifying blk0001.dat upon first use.  This is unavoidable, and nobody has proposed changing or eliminating verification.

Just shipping blk0001.dat with official bitcoin would eliminate several headaches that new bitcoin users continue to experience.


My personal suggestion is to have the block data as a separate download, but strongly recommended.  If you want to simplify the installation for Windows users and otherwise clueless computer users that can't take a block file of this nature and put it into the correct directory, perhaps setting up a formal installation file to put it where it needs to go would be more "user friendly", but all it really has to contain is just the block data.

The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.
BitcoinTalk
#11
From:
zipslack
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.

I'm not sure how it is for you, but when I upgrade Bitcoin I don't have to re-download any blocks. It just picks up right where it left off before the upgrade.
BitcoinTalk
#12
From:
RHorning
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
The purpose of this is mainly so those who are updating to a new version can do so without having to also keep downloading the same block data, which by definition is going to grow over time.

I'm not sure how it is for you, but when I upgrade Bitcoin I don't have to re-download any blocks. It just picks up right where it left off before the upgrade.

That is the point.  If the blocks are included in the update it would also by definition include blocks you already have obtained via the network.  This is why I'm suggesting that it ought to be a separate but strongly recommended download for new users instead of something combined in the normal distros.
BitcoinTalk
#13
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Another new user on IRC, Linux this time, was downloading at a rate of 1 block every 4 seconds -- estimated total download time around 4 days.

Other commenters in this thread are correct that upgrading users don't need a block database...  but something needs to be done to improve the initial block download experience for new users.  Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.

We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.
BitcoinTalk
#14
From:
tyler
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Other commenters in this thread are correct that upgrading users don't need a block database...  but something needs to be done to improve the initial block download experience for new users.  Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.



*something* needs to be done, the block chain will be *huge* in the next year or so, correct?
BitcoinTalk
#15
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Other commenters in this thread are correct that upgrading users don't need a block database...  but something needs to be done to improve the initial block download experience for new users.  Improve the database all you want.. you'll still have peers giving you blocks slowly for any number of reasons.



*something* needs to be done, the block chain will be *huge* in the next year or so, correct?

Yes, correct.

Presumably at some point there will be a lightweight client that only downloads block headers, but there will still be hundreds of thousands of those...
BitcoinTalk
#16
From:
zipslack
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
This is why I'm suggesting that it ought to be a separate but strongly recommended download for new users instead of something combined in the normal distros.

Sorry, I misunderstood you.

We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.

I suppose you are referring to the checkpoints? If so, as I understand it, they are only applied while verifying a block which has been downloaded. The contents of blk0001.dat and blkindex.dat are never checked by the client, because the client is designed to check that data before it gets written to those files. As satoshi indicated in this thread,

Doing your own verifying and indexing is the only way to be sure your index data is secure.  If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.
BitcoinTalk
#17
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
I suppose you are referring to the checkpoints? If so, as I understand it, they are only applied while verifying a block which has been downloaded. The contents of blk0001.dat and blkindex.dat are never checked by the client, because the client is designed to check that data before it gets written to those files.

Not quite true.  "-checkblocks" (CheckBlock()) performs quite a few checks on the contents of blk0001.dat / blkindex.dat.  AcceptBlock() does a bit more, adding context, but not much more.  But let's ignore that for the moment.

I think a more important point you're missing is that nobody is proposing that verification be skipped.  The bitcoin code is quite capable of verifying and indexing untrusted blk0001.dat data.  It would just need a few modifications to behave sensibly if blkindex.dat is missing.

The proposal is simply:  don't download massive amounts of uncompressed data using a protocol (bitcoin P2P) that wasn't designed for bulk data transfer.

Quote
As satoshi indicated in this thread,

Doing your own verifying and indexing is the only way to be sure your index data is secure.  If you copy blk0001.dat and blkindex.dat from an untrusted source, there's no way to know if you can trust all the contents in them.

The client is clearly capable of verifying the cryptographic integrity of blk0001.dat from an untrusted source, because it does that for blocks coming in over the network, and blk0001.dat contains... serialized blocks originally received from untrusted sources over the network.

It does not seem overly difficult to pass in blk0001.dat file position data to ProcessBlock(), and simply skip the WriteToDisk() storage call in downstream callee AcceptBlock().

BitcoinTalk
#18
From:
RHorning
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
The client is clearly capable of verifying the cryptographic integrity of blk0001.dat from an untrusted source, because it does that for blocks coming in over the network, and blk0001.dat contains... serialized blocks originally received from untrusted sources over the network.

It does not seem overly difficult to pass in blk0001.dat file position data to ProcessBlock(), and simply skip the WriteToDisk() storage call in downstream callee AcceptBlock()

Unless I'm mistaken here, what this implies is that at the moment the "official" client presumes that blk0001.dat contains validated data, so if you download that data from another source which may have been compromised, at the moment there is no way to verify this information.  This is but a temporary danger to be aware of while the software attempts to cope with this particular issue.

On the other hand, somebody could also put into the UI or as a command-line switch on bitcoind some sort of "reverifications" of the block data which would be performed locally.  I think there are other applications for this including perhaps as a precaution against some virus on your computer manipulating data in the block chain where this would be useful anyway, but it seems like an option which ought to be added to the software.  Since the verification code is already in the software, it is merely setting up the algorithm and triggering mechanism to perform that verification.  Indeed if there is a particular block which is of concern during the verification process, an effort to "heal" the chain based upon block requests to peer nodes could be used to fix potential errors or even discard the whole chain.

I hope such a feature eventually is added.
BitcoinTalk
#19
From:
MoonShadow
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
My understanding was that the client already did a blockchain recheck upon startup if the index was missing.  I did this when I first started, and it sure seemed like it was marching through the chain.  Doesn't it require an index to function anyway?  Why would it assume that the blockchain was valid upon startup?  Anyone could have edited it.  The genesis block is encoded into the client, isn't it?  That and the blockchain checkpoints are the only parts that are assumed correct, or am I wrong? There is no good reason to prevent a blockchain download via other methods.  In a future with the bitcoin network running close to it's capacity, downloading the entire blockchain over the P2P network will be harmful. 

Even a chain that has already been pruned of it's merkle trees should be able to be verified from the start, otherwise what good is using a merkle tree at all?
BitcoinTalk
#20
From:
satoshi
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Despite everything else said, the current next step is:
Quote
Someone should experiment with different Berkeley DB settings and see if there's something that makes the download substantially faster.  If something substantial is discovered, then we can work out the particulars.
In particular, I suspect that more read caching might help a lot.

Another new user on IRC, Linux this time, was downloading at a rate of 1 block every 4 seconds -- estimated total download time around 4 days.
Then something more specific was wrong.  That's not due to normal initial download time.  Without more details, it can't be diagnosed.  If it was due to slow download, did it speed up after 10-20 minutes when the next block broadcast should have made it switch to a faster source?  debug.log might have clues.  How fast is their Internet connection?  Was it steadily slow, or just slow down at one point?

Quote
We have the hashes for genesis block through block 74000 hardcoded (compiled) into bitcoin, so there's no reason why we shouldn't be able to automatically download a compressed zipfile of the block database from anywhere, unpack it, verify it, and start running.
The 74000 checkpoint is not enough to protect you, and does nothing if the download is already past 74000.  -checkblocks does more, but is still easily defeated.  You still must trust the supplier of the zipfile.

If there was a "verify it" step, that would take as long as the current normal initial download, in which it is the indexing, not the data download, that is the bottleneck.

Presumably at some point there will be a lightweight client that only downloads block headers, but there will still be hundreds of thousands of those...
80 bytes per header and no indexing work.  Might take 1 minute.

Quote
uncompressed data using a protocol (bitcoin P2P) that wasn't designed for bulk data transfer.
The data is mostly hashes and keys and signatures that are uncompressible.

The speed of initial download is not a reflection of the bulk data transfer rate of the protocol.  The gating factor is the indexing while it downloads.

BitcoinTalk
#21
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
If there was a "verify it" step, that would take as long as the current initial download, in which it is the indexing, not the data download, that is the bottleneck.
[...]
The speed of initial download is not a reflection of the bulk data transfer rate of the protocol.  The gating factor is the indexing while it downloads.

Sorry, these users' disk and CPU were not at 100%.  It is clear the bottleneck is not the database or indexing, for many users.

Quote
The data is mostly hashes and keys and signatures that are uncompressible.

bzip2 gives you 33% compression ratio, saving many megabytes off a download:

Code:
[jgarzik@bd data]$ tar cvf /tmp/1.tar blk0001.dat
blk0001.dat

[jgarzik@bd data]$ tar cvf /tmp/2.tar blk*.dat
blk0001.dat
blkindex.dat

[jgarzik@bd data]$ bzip2 -9v /tmp/[12].tar
  /tmp/1.tar:  1.523:1,  5.253 bits/byte, 34.34% saved, 55439360 in, 36402074 out.
  /tmp/2.tar:  1.512:1,  5.291 bits/byte, 33.86% saved, 103690240 in, 68577642 out.

I wouldn't call 33% "uncompressible"
BitcoinTalk
#22
From:
theymos
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Sorry, these users' disk and CPU were not at 100%.  It is clear the bottleneck is not the database or indexing, for many users.

It seemed to me that it was some sort of disk problem or network condition on his end. Some selected quotes from my IRC log:
Quote
<manveru> also, when i woke up, there were thousands of entries in the debug.log that look like: trying connection  lastseen=-135.6hrs lasttry=-358582.4hrs
<theymos> How many connections do you have?
<manveru> 2 right now
<theymos> How many blocks do you have?
<manveru> blockcount and blocknumber are 29124
<theymos> How fast is that increasing?
<manveru> around 1 every 4 seconds
<jgarzik> manveru: 32-bit or 64-bit linux?
<manveru> 64
<manveru> now 'blkindex.dat flush' takes a few minutes :|
<manveru> still hangs on flush
<theymos> manveru: Are you on some network file system?
<manveru> no, just a normal harddisk
<manveru> it's only 5200 rpm though

Also, replacing the blocks might have prevented him from noticing a transaction:
Quote
<manveru> jgarzik: sent me the blocks, but it didn't change my balance
<MT`AwAy> manveru: in your getinfo you're at block 94236 ?
<manveru> yeah
BitcoinTalk
#23
From:
jgarzik
Subject:
Problem: opening and closing database for each block
Date:
Building blkindex.dat is what causes all the disk activity.
[...]
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.

The following code in AddToBlockIndex(main.cpp) is horribly inefficient, and dramatically slows initial block download:

Code:
   CTxDB txdb;
    txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));

    // New best
    if (pindexNew->bnChainWork > bnBestChainWork)
        if (!SetBestChain(txdb, pindexNew))
            return false;

    txdb.Close();

This makes it impossible to use a standard technique for loading large amounts of records into a database (db4 or SQL or otherwise):  wrap multiple record insertions into a single database transaction.  Ideally, bitcoin would only issue a TxnCommit() for each 1000 blocks or so, during initial block download.  If a crash occurs, the database remains in a consistent state.

Furthermore, database open + close for each new block is incredibly expensive.  For each database-open and database-close operation, db4
  • diagnose health of database, to determine if recovery is needed.  this test may require data copying.
  • re-init memory pools
  • read database file metadata
  • acquire file locks
  • read and initialize b-tree or hash-specific metadata.  build hash table / b-tree roots.
  • forces a sync, even if transactions called with DB_TXN_NOSYNC
  • fsync memory pool

And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.

That's right, that long list of operations is executed per-database (DB), not per-environment (DB_ENV), for a database close+open cycle.  To bitcoin, that means we do this for every new block.  Incredibly inefficient, and not how db4 was designed to be used.

Recommendations:

1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown.  db4 is designed to handle crashes, if proper transactional use is maintained -- and bitcoin already uses db4 transactions properly.

2) For the initial block download, txn commit should occur once every N records, not every record.  I suggest N=1000.



EDIT:  Updated a couple minor details, and corrected some typos.
BitcoinTalk
#24
From:
satoshi
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
It seems like you're inclined to assume everything is wrong more than is actually so.

Writing the block index is light work.  Building the tx index is much more random access per block.  I suspect reading all the prev txins is what's slow.  Read caching would help that.  It's best if the DB does that.  Maybe it has a setting for how much cache memory to use.

Quote
1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown.
Already does that.  See CDB.  The lifetime of the (for instance) CTxDB object is only to support database transactions and to know if anything is still using the database at shutdown.

Quote
And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.
If it was doing that it would be much slower.  It's supposed to be only once a minute or 500 blocks:

    if (strFile == "blkindex.dat" && IsInitialBlockDownload() && nBestHeight % 500 != 0)
        nMinutes = 1;
    dbenv.txn_checkpoint(0, nMinutes, 0);

Probably should add this:
    if (!fReadOnly)
        dbenv.txn_checkpoint(0, nMinutes, 0);

Quote
2) For the initial block download, txn commit should occur once every N records, not every record.  I suggest N=1000.
Does transaction commit imply flush?  That seems surprising to me.  I assume a database op wrapped in a transaction would be logged like any other database op.  Many database applications need to wrap almost every pair of ops in a transaction, such as moving money from one account to another. (debit a, credit b)  I can't imagine they're required to batch all their stuff up themselves.

In the following cases, would case 1 flush once and case 2 flush twice?

case 1:
write
write
write
write
checkpoint

case 2:
begin transaction
write
write
commit transaction
begin transaction
write
write
commit transaction
checkpoint

Contorting our database usage will not be the right approach.  It's going to be BDB settings and caching.
BitcoinTalk
#25
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
Yeah, I missed the database-open caching buried in all the C++ constructors.  Major red herring, sorry about that.

db4 cache control is http://download.oracle.com/docs/cd/E17076_01/html/api_reference/CXX/dbset_cachesize.html
BitcoinTalk
#26
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
I instrumented my import using the -initblock=FILE patch posted last night, putting printf tracepoints in TxnBegin, TxnCommit, TxnAbort, Read and Write:

Code:
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=000000005b5c1859db19  height=1751  work=7524897523416
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=00000000f396ab6b62ba  height=1752  work=7529192556249
ProcessBlock: ACCEPTED
CDB::Write()
DB4: txn_begin
CDB::Write()
CDB::Write()
CDB::Write()
DB4: txn_commit
SetBestChain: new best=000000000c6bcf972117  height=1753  work=7533487589082

So, it appears that we have a CDB::Write() that occurs outside of a transaction (vTxn is empty??).

txnid==NULL is perfectly legal for db4, but it does mean that callpath may be operating outside of the DB_TXN_NOSYNC flag that is set in ::TxnBegin().  Thus, a CDB::Write() outside of a transaction may have synchronous behavior (DB_TXN_SYNC) as governed by DB_AUTO_COMMIT database flag.

EDIT:  Wrapping WriteBlockIndex() inside a transaction does seem to speed up local disk import (-initblocks).

Code:
--- a/main.cpp
+++ b/main.cpp
@@ -1427,7 +1427,10 @@ bool CBlock::AddToBlockIndex(unsigned int nFile, unsigned
     pindexNew->bnChainWork = (pindexNew->pprev ? pindexNew->pprev->bnChainWork
 
     CTxDB txdb;
+    txdb.TxnBegin();
     txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));
+    if (!txdb.TxnCommit())
+       return false;
 


Of course that implies begin+commit+begin+commit in quick succession (SetBestChain), so maybe a less naive approach might be preferred (nested transactions, or wrap both db4 writes in the same transaction).
BitcoinTalk
#27
From:
jgarzik
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
I timed two runs with clean data directories (no contents), -noirc, -addnode=10.10.10.1, Linux 64-bit.  Hardware: SATA SSD

Mainline, no patches:
     32 minutes to download 94660 blocks.

Mainline + TxnBegin/TxnCommit in AddToBlockIndex():
     25 minutes to download 94660 blocks.

BitcoinTalk
#28
From:
satoshi
Subject:
Re: RFC: ship block chain 1-74000 with release tarballs?
Date:
That's a good optimisation.  I'll add that next time I update SVN.

More generally, we could also consider this:

        dbenv.set_lk_max_objects(10000);
        dbenv.set_errfile(fopen(strErrorFile.c_str(), "a")); /// debug
        dbenv.set_flags(DB_AUTO_COMMIT, 1);
+       dbenv.set_flags(DB_TXN_NOSYNC, 1);
        ret = dbenv.open(strDataDir.c_str(),
                         DB_CREATE     |
                         DB_INIT_LOCK  |
                         DB_INIT_LOG   |

We would then rely on dbenv.txn_checkpoint(0, 0, 0) in CDB::Close() to flush after wallet writes.