BitcoinTalk

On IRC bootstrapping

On IRC bootstrapping

A week or so ago I met a very nice Freenoder staffer in the #bitcoin and #bitcoin-dev channels. He told me that the #bitcoin channel turned up on Freenode's radar as it looks like a Botnet Command and Control channel, but after I explained to him how Bitcoin works and why they need IRC, he said that the channel at it's current size is not a problem.

However, this got me thinking and later this week I also discussed the topic on IRC, and I came to the conclusion, that IRC is the wrong method for bootstrapping, especially in it's current form. At the moment, each client will connect to IRC and stay connected. Using /who and join messages, the client will connect to the found IPs on port 8333 as a bootstrapping method. However, the clients internally also talk to each other and broadcast new nodes via the Bitcoin protocol. Still, they are always online in IRC. This has various disadvantages:

  • IRC connectivity is necessary for bootstrapping (firewalls often block it and Freenode blocks TOR)
  • There is a single point of failure (Freenode)
  • We are leeching Freenode's services instead of using our own infrastructure. Many servers actually disallow bot connections in their MOTDs.
  • Minor point: The additional protocol inside Bitcoin brings extra complexity

There is already a list of permanently-on Bitcoin IPs around in this forum, which is a nice idea, but not very scalable, thus I propose the following solution: Gnutella and MUTE face very similar bootstrapping problems. To solve them, they rely on a list of "Gnutella Webcaches". Those webcaches are run by volunteers on simple PHP servers and a master list of them is distributed with each Gnutella/MUTE release. When a client wants to join the network, it asks one or two of the Webcaches via HTTP for a list of other nodes and also gets added to that list (which is usually a list of the last X clients seen). Every few hours (or days) a running client reconnects to the webcache to tell it, that it is still alive and does not have to be deleted from the list. I suggest, that the same thing is implemented for Bitcoin. Volunteers could run those webcaches on cheap PHP webspace and tell their URL to Satoshi or Sirius, who in turn could add the list to each release. This would allow users running behind a restrictive firewall or TOR to use Bitcoin without manually finding other nodes, and it is a much more scalable approach. (As a bonus we could remove those HTTP calls to whatismyip.com or similar sites).

Of course, there might be a better idea for bootstrapping Bitcoin and I would love to hear it. Or maybe suggestions for the Webcache idea. Please post them here!

Cheers,
soultcer

Re: On IRC bootstrapping

To eliminate segmentation, every peer should have a list of every other peer. Tor also has this requirement, so we should copy them: have several trusted directory servers periodically create a signed list of all peers and publish it via HTTP. All BitCoin clients have the option to act as a directory mirror, which will be indicated in the dirServers' list. Generators need to ask to be added to the list (which could also include info like the maximum number of connections that peer will accept), but people just wishing to make transactions can just get the list from a dirMirror and connect to a few random peers.

If this is too centralized, we can do what I2P did and allow anyone to become a directory server. You need to be able to detect when a dirServer goes rogue in this case, though.

Re: On IRC bootstrapping

I vote for the I2P method, myself. It works great.

Re: On IRC bootstrapping

Thanks soultcer for talking with the Freenode staffer.  Good to know it's OK at the current size, and now they know who we are.  They're supportive of projects like TOR so I hope they would probably be friendly to us.  We don't want to overstay our welcome.  If we get too big, then by the same token, we're big enough that we don't need IRC anymore and we'll get off.

We only needed IRC because nobody had a static IP.  In the early days there were some steady supporters, but they all had pool-allocated IPs that change every few days.  IRC was only intended as a temporary solution.  Bitcoin's built-in addr system is the main solution.

Bitcoin can get the list of IPs from any bitcoin node.  In that sense, every node serves as a directory server.

When there are enough static IP nodes to have a good chance that at least one will still be running by the time the current version goes out of use, we can preprogram a seed list.

How do you think we should compile the seed list?  Would it be OK to create it from the currently connected IPs that have been static for a while?

BTW, if we want to supplement by deploying separate directory server software, may I suggest IRC?  IRC is a good directory server (I've heard it has other uses too), and there are mature IRC server implementations available that anyone can run. Smiley  Bitcoin's IRC client implementation is already thoroughly tested.

Re: On IRC bootstrapping

We all talk about bootstrapping systems, how ever, my idea might be a bit better.

A user starts bitcoin on a host for the first time, and it will initially download a list of nodes that it will connect to.
(until, of course, we have a lot of static nodes we can hard code into bitcoin...)
Then, the client tries to connects to those IPs on that list it downloaded, or when it already has a list downloaded from the last time it started bitcoin, connect to those.
When we're connected, the client asks every node for a list of nodes they know and updates its node list.
Once a complete list is obtained, it is saved on the hard drive and a copy is kept in memory. (This is because we want to have a list of nodes without actually connect to that indexing server.)
And finally, the node is completely connected to the network.
When a new node connects (when it receives a "new node packet"), the list is both updated in memory and saved to the hard drive again.
To make updating the list with new nodes so bandwidth friendly as possible, I suggest that every node "echoes" the IP of a new node connecting to the network to all the nodes it knows...

Pros:
* Has bootstrapping in mind.
* Is distributed for clients that have a node list

Cons:
* Every new client needs to connect to a server to get a new node list until we're done with bootstrapping.

This, in my eyes, seems like the best solution to our bootstrapping problem...
PS: If we implement this, we might just wanna check if the "new node packet" we received contains bogus IPs, or IPs that resolve to .gov domains! Tongue

Re: On IRC bootstrapping

Hi all.

This may be a stupid idea, and if it's not stupid, may not be viable for the next few years, but I thought I'd toss it out there anywhere:

What about Multicasting?

IPv6 is supposed to have better multicasting support than IPv4, and if I did not misunderstand the Bitcoin protocol, most messages need to be broadcasted to the entire network. Theoretically a node could send such messages to a global multicast address, and everyone will receive it in a bandwith-efficient way.

That would make bootstrapping in the traditional sense obsolete, since the client would just have to subscribe to the multicast channel. The remaining "Give me block xyz" requests could be handled by optionally including a field in messages to the multicast channel that basically says "My address is 2001:db8::42, and I'm willing to answer direct queries for specific blocks". After listening to the channel for a while, such a packet should come around, because, at the very least, new blocks will be announced there every so often.

Re: On IRC bootstrapping

I don't know that much about IPv6, but it sounds good, if it's possible.
How ever, keep in mind that as of now, 9/10th of the world still uses IPv4!
So it is a great idea, and developer(s) (how much does bitcoin have anyways?) should implement it in bitcoin (in my humbly opinion).
How ever, making it default to use IPv6 isn't a good idea.

It's totally viable when everyone will be using IPv6, it's just everyone still uses IPv4!

(And I see now that in my previous post I basically explained what other people adviced... Oops!)

Re: On IRC bootstrapping

TCP doesn't work with multicasting. And I doubt it will ever be easy for a home user to join a multicast group.

Re: On IRC bootstrapping

If you guys are interested in ipv6, there are a lot of transition mechanisms available.  Windows comes with teredo which is kind of complicated for what it does.. I like 6to4, basically every ipv4 node has an ipv6 prefix which is 2002:<ipv4 addr> /48.  It is very easy to set up 6to4 on a linux box.  Here is my setup as an example:

Code:
echo 0 > /proc/sys/net/ipv6/conf/all/autoconf
echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra
echo 0 > /proc/sys/net/ipv6/conf/all/accept_redirects
echo 0 > /proc/sys/net/ipv6/conf/all/router_solicitations
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding
ip tunnel add 6to4tunnel mode sit ttl 200 remote any local 76.122.46.187
ip link set dev 6to4tunnel up
# listen for 6to4 traffic to me
ip -6 addr add 2002:4c7a:2ebb::1/16 dev 6to4tunnel
# route to non 6to4 ipv6 hosts
ip -6 route add 2000::/3 via ::192.88.99.1 dev 6to4tunnel metric 1
# add address to lan0
ip addr add 2002:4c7a:2ebb:0001::1/64 dev lan0
# add address to lan1
ip addr add 2002:4c7a:2ebb:0002::1/64 dev lan1
echo "Starting radvd..."
/opt/radvd/sbin/radvd


That is literally all it takes.  This is for my linux box which is acting as a router.  You just convert your ipv4 address to hex.  For instance mine, 76.122.46.187 == 4c 7a 2e bb

The v4 address 192.88.99.1 is an anycast address.. There are routers that pick up the 6to4 traffic and bridge it over into the native v6 land.  Because of how routing works you will just get to the closest (network wise) bridge router when you transmit to v6 addresses.

Setting up radvd allows the client machines on your lans (I have lan0 and lan1) to receive an automatic ipv6 address.  With this set up you can hit ipv6.google.com and see the letters bouncing to know it works.

If you're planning to try this, google up some info on 6to4 first so you understand how it works.  Also be aware that your machines which are NAT'd with ipv4 are globally accessible with this set up and are not filtered unless you set that up separately (windows firewall, ip6tables, etc)

Re: On IRC bootstrapping

I throughly agree with this. In the long run, IRC should be completely phased out and replaced with something like Gnutella's host caches or Tor's dictionary servers (as others have suggested).

At the very least, Bitcoin should disconnected from IRC as soon as it has a list of peers to connect to. It should also cache that list so it wouldn't need to reconnect upon the next startup.

Just my 0.02฿.

Re: On IRC bootstrapping

Anyone into a set of multi entity owned DNS servers that cooperate for a Fast Flux [1] network to ensure bootstrap availability?

[1] http://en.wikipedia.org/wiki/Fast_flux

Re: On IRC bootstrapping

What's wrong with IRC?  It's just another method that's used to exchange the peer list.  You can just prevent it from connecting and use -addnode=1.2.3.4 to connect to a known node for bootstrapping if you want..

If nodes disconnect from IRC after they get their list then it makes it less useful for bootstrapping since it will be empty except for the node that's trying to bootstrap at the time.

IRC has been around forever and it's well documented (and easy to understand for newcomers) - why create something more complicated?

Re: On IRC bootstrapping

I like the idea of distributed host caches like Gnutella uses. At the moment, for the majority of people, IRC is a single point of failure. Let's assume that for some reason our Freenode channel was gone. Maybe Freenode got fed up and shut us down. Maybe MenInBlack saw our system, laughed maniacally, and then pressured Freenode to shut us down.

When you start your client, it will do nothing. You could drop to a command line and type "-addnode" (or is it -peer? whatever) to connect to a known node, but at that point you'd somehow need to know a node. It probably wouldn't be that hard for one of us, but what about a new user? We could keep a list of peers on the website for them to use, but at that point, they've gone from "just double click the shiny gold coin and get trading" to "check our website for updated peer lists, open command prompt, navigated to the bitcoin directory, and type the proper peer....." And if MIB were after us, the website would probably be long since gone.

Of course, we could implement addpeer in a more user-friendly manner. Perhaps a popup that says "I can't connect to the network. Enter a peer: " with instructions on some ways to find one, but at that point we're creating a social solution to a technical problem.

Also, if we get bigger we will need to move away from IRC anyway (as implied by the OP's conversation with a Freenode staffer). And what about Tor users? Why should people who want to use Tor to be anonymous have to manually add a peer?

Finally, anyone on Freenode can easily get a list of all running Bitcoin clients, when they came online, when they went offline, etc. That goes against the project's stated goal of anonymity. Of course, with a host cache system, anybody who connected to that cache could be logged by the server operator, but no one operator would have a full picture of the network.

I think the IRC solution is a wonderful beginning, and I applaud how stable it has proven to be. It was a great decision to get the network up easily and concentrate on the more interesting and important considerations in the program. I just think that Bitcoin will outgrow it someday, if it hasn't already.

Re: On IRC bootstrapping

Bitcoin has its own distributed address directory using the "addr" message.  It's about time we coded in a list of the current long running static nodes to seed from.  I can add code so new nodes do not preferentially stay connected to the seed nodes, just connect and get the list, so it won't be a burden on them.

What do you think, should I go ahead with adding the seeds?

It'll still try IRC first.  The IRC has the advantage that it lists nodes that are currently online, since they have to stay connected to stay on the list, but the disadvantage that it's a single point of failure.  The "addr" system has no single point of failure, but can only tell you what nodes have recently been seen, so it takes a little longer to get connected since some of the nodes you try have gone offline.  The combination of the two gets us the best of both worlds and more total robustness.

Is there anyone who wants to volunteer to run an IRC server in case freenode gets tired of us?

Re: On IRC bootstrapping

I run an IRC server you can use, it's fairly stable but it's not on redundant connections or anything.  It is only two servers right now but we don't mess with it or anything, it just runs.

My box is a dedicated irc server:
 2:28PM  up 838 days, 20:54, 1 user, load averages: 0.06, 0.08, 0.08

You can use irc.lfnet.org to connect.

I hang out on #linuxos if anyone wants to drop in.

Re: On IRC bootstrapping

This is a common problem in P2P, known as Original Introduction, although bootstrapping is also a good word for it. The problem with bootstrapping is that you can't decentralize it. Whether it's IRC or HTTP or DNS, the client needs to be hardcoded with an address or list of addresses which is sufficiently fresh that at least one of the listed addresses is still active. After the first node is reached, you are no longer in Original Introduction mode and can use the full range of techniques for decentralization, such as gossip. Unless, of course, you get disconnected from the network and all of your known peers go away, in which case you're back to bootstrapping.

There are two properties that are at odds when you chose a bootstrapping method: robustness (scalability/reliability) and freshness. Robustness is increased at the expense of freshness by caching on multiple servers, as is usually done with HTTP peer lists. Freshness is maximized (at least up to the TCP timeout) at the expense of robustness by having everyone connected, as with IRC. Of course, the key is finding the right mix of robustness and freshness because you need both for the bootstrap to be successful.

Here are some of my current favorite methods for bootstrapping:

Append list of fresh peers to executable or installer dynamically on download. People usually get the application from its official website, so the website is already a point of failure for new users. You're already hardcoding an address in the application, the address that the application will use to bootstrap. So instead just add fresh peers at the moment of download. You need some fancy code in the executable to read the list off the end, but I've implemented this in an NSIS installer and it's not that hard. Most software developers are upset by the idea of this method.

Connect via XMPP to Google App Engine application. This gives the freshness of IRC, but with more robust scaling. App Engine is mostly for writing web apps, but it provides email and XMPP handling as well. It would be simple to write one application that could handle peer lists via either XMPP or HTTP with the same handler code. I'm currently using this in an application and it works well and is very reliable. I only wish there was a second App Engine to use as a fallback because it does have occasional downtime.

An alternative to requiring all nodes to include the complexity of a protocol like IRC or XMPP is to have a few special sentinel nodes which sit on the network and collect addresses of connected nodes via the usual decentralized methods available to an active node. These sentinel nodes periodically upload fresh addresses, say via HTTP POST to a number of websites. A new node can then download a fresh address list from any of the websites which is currently functioning and reachable. If you have 5 sentinels each uploading every 5 minutes (staggered), then you'll have updates roughly once a minute. This is on par with IRC in terms of freshness and is robust as you care to make it by varying the number of HTTP mirrors and the number of sentinels.

Re: On IRC bootstrapping

I think the way eMule handles bootstrapping for its KAD-network is pretty close to optimal:

The list of known peers is stored in a file (nodes.dat), and every client maintains a list of known nodes in that file (sorted by longest uptime, I think -- that's an intrinsic property of Kademlia, but still a good idea). The released client should be accompanied by such a file that contains the addresses of a few reliable peers on static IP addresses, from which a new client can then get more addresses to connect to (and hence store in its own file).

If the "seed list" gets out of date, or the server is shut down or something, you can just ask *anyone* in the network to publish his nodes-file (on rapidshare, say), and voila, you've got a fresh list of IPs you can connect to.

Re: On IRC bootstrapping

The SVN version now uses IRC first and if that fails it falls back to a hardcoded list of seed nodes.  There are enough seed nodes now that many of them should still be up by the time of the next release.  It only briefly connects to a seed node to get the address list and then disconnects, so your connections drop back to zero for while.  At that point, be patient.  It's only slow to get connected the first time.

This means TOR users won't need to -addnode anymore, it'll get connected automatically.  

Re: On IRC bootstrapping

I run an IRC server you can use, it's fairly stable but it's not on redundant connections or anything.  It is only two servers right now but we don't mess with it or anything, it just runs.

My box is a dedicated irc server:
 2:28PM  up 838 days, 20:54, 1 user, load averages: 0.06, 0.08, 0.08

You can use irc.lfnet.org to connect.
This seems like a good idea.

What does everyone think, should we make the switch for 0.3?

Re: On IRC bootstrapping

You may want to leave Freenode in as a fallback server -- if his server doesn't work, use Freenode's.

Re: On IRC bootstrapping

Maybe we should have an option dialog that allows you to choose the IRC server and channel you connect to?

Re: On IRC bootstrapping

Everybody needs to connect to the same IRC server and channel so they can find each other.

You may want to leave Freenode in as a fallback server -- if his server doesn't work, use Freenode's.
It might not be good if we suddenly rushed freenode with a ton of users all at once.

The fallback is our own seed system.

irc.lfnet.org is pretty old and has impressive uptime.  I think it's going to be fine.

We could take IRC out at some point if we want, but I'd rather ease into it and just test our own seed system as a backup for now, and I really like the complementary redundant attributes of the two different systems.