Search This Blog

Monday, October 2, 2006

SIP and Skype, P2P and Supernodes - what a melee





Gaaah. I happened to bump into Slashdot today and read this:
“…is that when you install the Skype client, it will drain system resources by running as a supernode from time to time”
and finally concluding that the author will more likely use SIP over Skype.

Implictly implying that with SIP, you are free from such issues !

Let’s get the facts straight:

  • P2P is an architecture, SIP is a protocol. Skype is a product, and Skype uses its own proprietary protocol (you can call it ‘Skype Protocol’ if you want).
  • A SuperNode system forms a fundamental design choice of many existing P2P networks, including Skype, Kazaa, Grokster and several other massively scaled networks.
  • Today, most of SIP’s deployment uses a centralized architecture. In other words, all your SIP phones register with some central server and some central proxy. Your calls are routed through them. If they fail, you cannot reach other users, or, will have to attempt to call them directly (not as simple, because the person who is sitting in your buddy list as sippal@myisp.com may actually be user457@001dxp.bbcppcspool.myisp.com and this complex ID is mapped to its simpler one by the proxy /location server that went down.
  • There is current frantic work going on in the p2psip mailing list which is attempting to solve the following issues:
  • How does one map a SIP flow over a P2P network ?
  • Does it make sense to deploy a SIP overlay over a P2P network with common architectural principals of existing P2P networks (like DHT, for instance), or,
  • Does it make sense to deploy a P2P network over the SIP protocol ?
In other words, if you haven’t figured it yet, Supernodes have nothing to do with SIP. If you still haven't got it, A P2P SIP Network could and most likely would also use Supernodes. One good way to avoid using supernodes, is , um, say, uh, use what we know as a Centralized network.

Some Basics
Supernodes is often quoted as a necessary evil for a largely scaled P2P network. Let us first spend a bit of time, understanding how P2P networks differ from centralized networks.

The biggest difference, is that in a pure P2P network, there is no well known or centralized node that is mostly always available. P2P networks are plagued by problems of churn (a node may be in a network at a particular point of time, and may disappear the next moment because the user logs out), location & routing (how do you locate a user, Joe, if you only know his name, but not how to get to him?)

To address such elementary issues, which do not exist in centralized networks, several implementations have implemented very effective algorithms, such as DHT (Distributed Hash Tables), which try and establish an analogy between a unique encoded key and the contents that need to be retrieved in such a way that the key could be used as a primary identifer to locate the data. For a client that is trying to locate that data, it would generate the key and this key would traverse a P2P network, using a defined protocol, till the node(s) that store data related to the key responds. Ofcourse, this is an oversimplification. There have been several improvements to optimized DHT routing, including alternate architectural suggestions on routing and location for P2P networks.

Now enter supernodes. Why do we need it ?
Well, let’s put it this way. Networks are not made the same around the world. At any one point of time, there will be users on a high speed cable, a medium speed dsl, or a low speed dial up. And they all want to communicate, and locate, effectively. Supernodes are inbetween nodes, between the source and the destination, which can provide additional services to other clients. Here are somethings a supernode could do:
  • If User A wants to reach User B, and SuperNode (SN) knows a shorter way to reach B, it may act as a router to route User A directly to B, and avoid the message having to hop across multiple networks
  • If User A is behind a firewall, and wants to talk to User B, but needs a Media Relay server outside its firewall to route media through, the SN, if it has enough CPU cycles empty, may agree to serve as A’s external Relay server for its session. During the session, if the SN CPU gets busy (say the owner of the SN decides to make a call), it can drop the role of a relay server and A will look for another
  • If User A tries to reach User B and finds B offline, the SN may agree to be a ‘voice mail’ service for A, to receive its voice mail, then send it off to a central voice mail server and delete its copy. Ofcourse, the voice mail is typically encrypted with a key that the SN does not know of, so it is, for the most part, storing a bunch of encrypted bits for A.
The catch is, that the “SN” could be you. It could be any client in the P2P network, that has spare cycles to participate in other activities, not related to you, which makes the network more efficient.

But that’s the challenge of a P2P architecture. People choose a P2P network because it is scalable, and fault tolerant. In principle, there is no centralization, and each node can take over a functionality required to keep the network running, and there is a discovery protocol defined to find out such ad-hoc nodes. However, in gaining distribution and fault tolerancy, P2P networks need to deal with efficiency (how fast is the network) and effects of churn (if there is no accountability for nodes, and you cannot make guarantees of their availability status in the network [churn]. how do you provide services, like, say, voice mail, - who accepts the voicemail ?)

If we don’t have supernodes, the network wil need to re-discover itself each time, and will not be as efficient as users want it to be. If we don’t have supernodes, navigating networks and solving their challenges increase (firewall was one example above). If we don’t have supernodes, that is, if a client refuses to behave as anything more than a client, how do we provide services which, for example, need to kick in if some client is not online ?

The problem with SuperNodes is not in architecture, its in….
…Implementation! The problem with supernodes is that some networks do not allow you to specify when you choose to be a Supernode. And not without reason. If each participant decides to switch of Supernode functionality, the network performance degrades substantially. On the other hand, if the network decides for you, what the SuperNode rate threshold is, then you have no control over your computer’s resources. You need to trust the network.

An implementation can take it to limits – you may find your CPU choked at times – which is typically a result of bad threshold implementations.

Unfortunately the industry is now calling SuperNodes Malware ! I’ve seen lawsuit applications, I’ve seen marketing statements from new VoIP companies that say ‘We don’t do Supernodes’ like as if its evil.

You can’t have the cake
and eat it too. Supernodes are an important piece of the puzzle of well performing P2P networks. If you take them away, you may as well go back to the centralized model of operation, and keep P2P for your marketing collateral as what happens when your clients can directly call each other’s IP in a peer2peer fashion. Hooray !

5 comments:

  1. Great post. There's a surprising amount of FUD around Skype and Supernodes that needs dispelling. Not least is that a PC behind a firewall or NAT *WILL NEVER* be a supernode which gives the lie to "Skype will use your bandwidth"

    There's a bit more analysis to do here about approaches to NAT-firewall busting in P2P networks and VoIP. It's not clear to me that media needs to be relayed. Although an intermediary Supernode is required during connection setup between two devices behind NAT.

    There's a simple solution to voicemail and text's to nodes off line. That's to store it locally and deliver it when the other party comes on line. You don't need central storage, you just need delayed delivery.

    ReplyDelete
  2. Julian, thanks for the commentary. I am guessing that your comment on a node behind a NAT never being a sn is specific to skype ? That is interesting. Could you confirm this ?

    Finally, unfortunately, the problem of voicemail may not be as simple as storing in locally. Consider the case where A calls B and B is offline. Now A goes for a vacation, and B comes online the next day. Then, in this case, B will have to wait for the voicemail till A gets back on.

    ReplyDelete
  3. Quote from "Skype Guide for Network Administrators - edition 1.0.1"
    http://www.skype.com/security/guide-for-network-admins.pdf#search=%22supernode%20voip%22
    "When a Skype client becomes a supernode, it accepts network connections from a
    small number of other Skype users for the purpose of maintaining the accuracy of
    the Global Index. Although the supernode activity is entirely transparent to the user,
    a Skype client that is unable to receive inbound network connections (such as a user
    behind a NAT or firewall) will never become eligible to become a supernode nor will it
    ever be asked to relay a third party’s traffic."

    ReplyDelete
  4. thanks for the confirmation, Buda. The text does not really say that no node behind a NAT can be a supernode. It says no node which is unable to receive inbound connections (with NAT as one possible example) will not be an SN. This is what I thought - because being behind a NAT does not necessarily mean it cannot accept inbound connections - it depends on how the NAT device is configured, and its type.

    ReplyDelete
  5. You got that right, the keywords are "unable to receive inbound network connection". SInce there are lots of NAT traveresal techniques and lots of NAT enabled network devices I think we can presume that NAT won't be a problem now or in the near future. Also I find the use of SuperNodes for VoIP a feasible idea, because of the constant growth of a CPU/Hardware power, while the VoIP requirements are already set and is not likely to grow much.

    ReplyDelete