BitcoinTalk
Reading/Writing Blocks and FLATDATA

View Satoshi only

External link

I'm trying to understand the way Bitcoin stores block data - among other things I want to run some statistics on the block chain / transaction history and check just how anonymous Bitcoin really is.  So I went to the source to see how Bitcoin reads/writes block data to file.

(ETA: this is in 0.3.2)

In main.h we have:

Code: (CBlock::WriteToDisk)
bool WriteToDisk(bool fWriteTransactions, unsigned int& nFileRet, unsigned int& nBlockPosRet)
{
    // Open history file to append
    CAutoFile fileout = AppendBlockFile(nFileRet);
    if (!fileout)
        return error("CBlock::WriteToDisk() : AppendBlockFile failed");
    if (!fWriteTransactions)
        fileout.nType |= SER_BLOCKHEADERONLY;

    // Write index header
    unsigned int nSize = fileout.GetSerializeSize(*this);
    fileout << FLATDATA(pchMessageStart) << nSize;

    // Write block
    nBlockPosRet = ftell(fileout);
    if (nBlockPosRet == -1)
        return error("CBlock::WriteToDisk() : ftell failed");
    fileout << *this;

    // Flush stdio buffers and commit to disk before returning
    fflush(fileout);
#ifdef __WXMSW__
    _commit(_fileno(fileout));
#else
    fsync(fileno(fileout));
#endif

    return true;
}

and

Code: (CBlock::ReadFromDisk)
bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true)
{
    SetNull();

    // Open history file to read
    CAutoFile filein = OpenBlockFile(nFile, nBlockPos, "rb");
    if (!filein)
        return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
    if (!fReadTransactions)
        filein.nType |= SER_BLOCKHEADERONLY;

    // Read block
    filein >> *this;

    // Check the header
    if (CBigNum().SetCompact(nBits) > bnProofOfWorkLimit)
        return error("CBlock::ReadFromDisk() : nBits errors in block header");
    if (GetHash() > CBigNum().SetCompact(nBits).getuint256())
        return error("CBlock::ReadFromDisk() : GetHash() errors in block header");

    return true;
}

FLATDATA is defined in serialize.h like so:
Code: (FLATDATA)
//
// Wrapper for serializing arrays and POD
// There's a clever template way to make arrays serialize normally, but MSVC6 doesn't support it
//
#define FLATDATA(obj)   REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
class CFlatData
{
protected:
    char* pbegin;
    char* pend;
public:
    CFlatData(void* pbeginIn, void* pendIn) : pbegin((char*)pbeginIn), pend((char*)pendIn) { }
    char* begin() { return pbegin; }
    const char* begin() const { return pbegin; }
    char* end() { return pend; }
    const char* end() const { return pend; }

    unsigned int GetSerializeSize(int, int=0) const
    {
        return pend - pbegin;
    }

    template<typename Stream>
    void Serialize(Stream& s, int, int=0) const
    {
        s.write(pbegin, pend - pbegin);
    }

    template<typename Stream>
    void Unserialize(Stream& s, int, int=0)
    {
        s.read(pbegin, pend - pbegin);
    }
};

Now - and I apologize if I'm reading this wrong, this is a little more advanced C/C++ code than I'm used to - as I understand it, the FLATDATA call interprets the raw bytes of a CBlock object as an array (stream??) of characters.  The CBlock::WriteToDisk method writes the constant 4-byte message header (0xf9, 0xbe, 0xb4, 0xd9), the size of the CBlock object in bytes, and then the FLATDATA of the CBlock it's writing to disk - which is just the raw bytes of the CBlock object.  So after the header, the data written to file is byte-for-byte the same as the CBlock object represented in memory.  Also, if I'm reading it correctly, CBlock::ReadFromFile copies those bytes directly into the space allocated for a CBlock object in memory to re-create the block.  Is this correct?

Related question - I am under the impression that the exact way an instance of a C++ class is represented internally not guaranteed under standards; compiling a program with different compilers or different optimization flags can change the order in which member variables are stored in memory, and some debug mode compilers even add a few bytes between member variables to make memory inspection easier.  I'm not positive about this, it's just something I've picked up and never seriously questioned.
They key bits of code are:
Code:
fileout << FLATDATA(pchMessageStart) << nSize;
...
fileout << *this;
pchMessageStart are the four magic bytes, and those are written with FLATDATA.

The CBlock itself is written by << *this, and that's done by the IMPLEMENT_SERIALIZE in main.h:
Code:
    IMPLEMENT_SERIALIZE
    (
        READWRITE(this->nVersion);
        nVersion = this->nVersion;
        READWRITE(hashPrevBlock);
        READWRITE(hashMerkleRoot);
        READWRITE(nTime);
        READWRITE(nBits);
        READWRITE(nNonce);

        // ConnectBlock depends on vtx being last so it can calculate offset                                             
        if (!(nType & (SER_GETHASH|SER_BLOCKHEADERONLY)))
            READWRITE(vtx);
        else if (fRead)
            const_cast<CBlock*>(this)->vtx.clear();
    )

The READWRITE macros Do The Right Thing, reading in or writing out the members in a machine-independent way.

See http://github.com/gavinandresen/bitcointools for simplified Python code that can dump out transactions and blocks.

Thanks for clearing that up for me.

I keep making really elementary code mistakes....
FLATDATA was a workaround to serialize a fixed field length array.  There was a cleaner way to make it understand how to serialize arrays directly, but MSVC6 couldn't do it and I wanted to keep compatibility with MSVC6 at that time.  We don't support MSVC6 anymore because we use something in Boost that doesn't.  We lost support for it after 0.2.0.  Maybe someday I'll swap in the clean way that just knows how to serialize fixed length arrays without wrapping them in FLATDATA.