The service provider has connections with many nodes. When it gets a transaction, it blasts it out, and at the same time monitors the network for double-spends. If it receives a double-spend on any of its many listening nodes, then it alerts that the transaction is bad. A double-spent transaction wouldn't get very far without one of the listeners hearing it. The double-spender would have to wait until the listening phase is over, but by then, the service provider's broadcast has reached most nodes, or is so far ahead in propagating that the double-spender has no hope of grabbing a significant percentage of the remaining nodes.
This is a good start, but still not impermeable. This kind of security relies on the fact that the majority of nodes (which really just boils down to the majority of IP's) are honest. However, if the attacker could amass many IP addresses, then he could control the propagation (for example, by refusing to propogate word of the transaction at the vending machine), and still lead a successful double-spending attack. Which sort of brings us back to the main novel idea of bitcoin: truth is voted on by computing power, not IP.
For just a beverage, it may seem like this attack is uneconomical. But imagine if instead the machine dispensed small-denomination giftcards, and the attack was performed thousands of times (maybe even at the same time in a concerted effort). Suddenly, the cost of that IP block might be a bargain for the attacker...