website tonight analytics
magnify
Home Arjun Roychowdhury Facetime on Iphone 4: Vanilla unencrypted STUN and SIP
formats

Facetime on Iphone 4: Vanilla unencrypted STUN and SIP

(July 13: sorry for the downtime, looks like my bandwidth limits were exceeded. Upgraded my hosting package – fixed)

(note: Only the call part is Vanilla SIP. The procedure for registering a Facetime user into their servers etc. is all non-SIP, encrypted/ciphered.)

(for my user review of the iphone4 and bumper read here)

Well heck, good job Apple! I just tested facetime and did a quick check on its protocol. No hacking needed – just an on the wire black box inspection – its just plain SIP and STUN for firewall discovery. Apple plans to make this protocol public, and they seem to have done an excellent job. And thanks for showing the world that you don’t need complicated encryption and proprietary tunneling tricks for an excellent experience. You need a good codec set, a good media stack that can adaptively switch codecs and manage buffers  and a good ‘point-of-presence’ network for the most part.

I am just going to restrict this post to an overview of the flow.

Enjoy:

click on each image for a larger size (if they are small)

This is a facetime all flow – good, plain, SIP (they use MESSAGE for some proprietary data exchange during the call)

rest is perfect sip.

The protocols are here to see (besides SIP)

Ah here is their 200OK for INVITE

A quick look at their RTP stream:

Good Job Apple. Thanks for putting in an excellent quality, standards compliant SIP client embedded into your dialer experience.

 

43 Responses

  1. donnibNo Gravatar

    Hi,
    Does this mean that we can call a iphone 4 user from an regular SIP ? Every single iphone 4 user must have it’s own SIP address that we can contact or ?

    donnib

    • ArjunNo Gravatar

      @donnib – that would really depend on how apple will authenticate/admit users. As I mentioned, while most of this is vanilla SIP, there is proprietary stuff going on, along with new headers in the message exchange (primarily MESSAGE). Anyhow, I’d prefer for Apple to first publish their protocol formally before I post a blog on these details.

  2. jasonNo Gravatar

    good point, whats the complete URI, is see its blocked out in your packet captures. i see support for iLBC, which is good also, could work with skype.

  3. jasonNo Gravatar

    also, i want to point out that this really isnt all that suprising, 3gpp is all about the SIP – the proprietary data schema is in the IMS AS & HSS and the SCIM function of CSCF.

  4. ArjunNo Gravatar

    @jason: The complete req-URI in INVITE is user@myip:port – basically, facetime, in this version is sending it to the IP address+port of my iphone. On SIP/IMS: well, this is really plain SIP. It does not use any of the mandatory IMS extensions like PANI or others.

  5. DavidNo Gravatar

    It seems to me that the most impressive step occurs right before the SIP INVITE.

    They are doing a smooth transition from a 3G call into the VoIP session. Somehow, they are mapping a phone number to a visible IP address. Impressive enough in the simple cases, but downright amazing in the face of multiple operators, hidden caller-id, etc.

    How are they working this magic?

  6. Ray BellisNo Gravatar

    Did you happen to spot how it maps the called party’s telephone number to their current IP address?

  7. ArjunNo Gravatar

    We…ll, only the call part is SIP. There is a lot of cipher/TLS/SSL exchanges going on to authenticate a facetime user – so don’t expect to make a call to a Facetime SIP client using X-lite anytime soon ;-)

  8. [...] de vidéoconférence ? La question reste entière… mais le sujet suscite manifestement un certain intérêt. Et justement, tant Apple – indirectement – que l’examen des trames échangées [...]

  9. Tim WuNo Gravatar

    Thanks! I’ve been looking for this for a while.

  10. ArjunNo Gravatar

    @David, well, it seems pretty straightforward. Remember, Facetime does not do both hand-out (from CS-Wifi) and hand-in (Wifi-CS). It only does Hand-out. Once in wifi, you can’t get back to CS – that call is dropped. Doing a handover from CS to Wifi is pretty straightforward. Basically, a Wifi call can be set up in the background while the CS call is active. If the Wifi call fails for any reason, the CS call continues. I can’t speak for iphone, but in many other phones (like android), a CS call media is handled at the baseband level and for a VoIP call, it would be at the media framework level. Trying to establish a voip call does not interfere with the CS call at all. In fact, the part FaceTime does *not* do is more complicated – handover back from WiFi to CS – thats the more challenging part with respect to smooth media transition.

    It’s not hard for apple at all to map the PSTN # to a VoIP #. My strong guess is that Apple has already authenticated the binding with their facetime servers via their TLS/SSL exchanges (try it out, disable/enable facetime in settings and each time you do it, you will see these new security associations being set up)

    With the identity authenticated, all apple needs to do really is to send you an INVITE to the IP:port that is discovered by STUN (maybe they use other ICE procedures if STUN fails). As far as the # that is displayed on your screen during facetime, that is just the From header text in the SIP INVITE (which is fine, because Apple has already authenticated the identity outside of SIP). Similarly, now, apple can use the same PSTN # (Which is unique to every phone) to differentiate VoIP users too- this is typical VoIP stuff – see the Contact header for example, in the INVITE that is received.

  11. Christopher HerotNo Gravatar

    Has anyone tried sending an INVITE to the phone to see if it answers?

  12. [...] its promise to publish the FaceTime video calling protocol, some details are starting to emerge. Arjun Roychowdhury did a little packet sniffing and reports that the calls seem to go over vanilla SIP and STUN. The [...]

  13. MarcusNo Gravatar

    I don’t think David’s question is answered yet: How does Apple get the two phones’ IP addresses from the phone call, unless every iPhone 4 pings Apple every time it makes a call to report it’s current IP.

  14. ArjunNo Gravatar

    @Marcus, well, if you look at it, there are many ways Apple may know your phone’s IP. The entire framework of push notifications in iphone is based on a foundation of the apple push servers maintaining a persistent TCP connection as much as possible with your phone. There is HTTP traffic that also flows between your iphone and apple – when connected through WiFi, that would be your WiFi IP address. I don’t know in the case of Facetime, which channel it uses to get your IP, but my point is there are several channels as described above – to get the initial INVITE to your phone (apple uses a different port for SIP). Then STUN comes in before the media starts flowing (All of this is a guess – but I think it is reasonable)

  15. JustinNo Gravatar

    The SIP session is pretty standard, and then SDP will be sent in the INVITE to negotiate media endpoints and codecs to use to setup the call. This is where STUN comes in, as it allows media traversal through NAT.

    The FaceTime servers must have a media relay capability as well, because there will be many situations where two iPhones can’t connect directly to one another and must use something in the cloud to pass the media between the two.

  16. MatthiasNo Gravatar

    Thanks for the interesting information, that you have provided. In the following I have some remarks:

    The initial INVITE is sent, when you have answered the call at the called side.
    Have a look at the time stamps of the SIP messages, especially 180 Ringing and 200 OK! The SIP message flow does not correspond with the real call states.

    There is only one port (16402) for SIP signalling, RTP streams and RTCP!

    Apple does not use a SIP registrar / proxy. The session is established directly between the user agents.

    STUN does not address a STUN server in the cloud, but is end-to-end too. It seems, that it is used only to create the bindings in the NAT tables of the routers, simultaneously from both sides.

    Arjun, do you find any phone number, which is involved in the call, in the From, To, Via or Contact header?
    Or are only IP addresses used?
    Is the FQDN of the XMPP server part of the SIP URIs?
    Is it possible to post the XMPP server name / IP address?

  17. ArjunNo Gravatar

    @Matthias:
    1) Well, this session was when I received a Facetime call (INVITE came to my iphone 4) – INVITE is the invitation I got to answer the caller’s facetime call. I looked at the call flow again – its absolutely in line with a standard SIP call – first I got an INVITE, then I sent 100, then I sent 180, then I sent 200, then I received ACK.

    2) Yes, the call is P2P as far as SIP goes, no proxy cuteness as far as I could see (looking at Via) – don’t remember this fully, I’ll check again, but I think thats correct. (As far as I could tell, apple is using encrypted HTTP and potentially SMS to assert the identity and routing path to the user)

    3) I’ll take a look at the RTP packets again tomorrow, but no, I don’t believe I saw SIP and RT/C/P on the same ports

    4) Yes, I found phone numbers. From, To and Contact.

    5) No, I am not comfortable posting the server IPs – I really don’t want to give out apple server IPs at this stage (I fully understand anyone with FaceTime can easily see a wireshark dump for themselves, just that I don’t think it is kosher for me to post it)

  18. HenryNo Gravatar

    Where are the ICE attributes in the SDP answer in 200 OK message? Did Apple skipped some steps in ICE to optimize NAT discovery?

  19. MatthiasNo Gravatar

    @Arjun:

    1) The time between initial INVITE and ACK is 146 msec, between 180 and 200 OK only 19 msec. It is not possible to answer to the call so quickly. That’s because the real call establishment happens in XMPP (which is encrypted and you have no chance to decode it). The SIP session is used only to establish the media streams.

    2) Sure, P2P is a very basic SIP scenario, but the ‘normal’ way is to use Registrar and Proxy.

    3) You can also look at the port information in the From header and the m-lines for audio and video in SDP of the initial INVITE, that you published.

  20. HHFNo Gravatar

    Let's note that the IP addresses of both parties as well as the phone numbers are transmitted in clear text in the SIP packets.

    Also, the conversation is over RTP, not SRTP, which means it is not encrypted.

  21. [...] la gestion de la session) et STUN (pour passer à travers les proxys ou les firewalls) qui sont utilisés dans l’implémentation de Facetime. Le protocole RTP est utilisé pour le transport des paquets et l’encodage de la vidéo est [...]

  22. ArjunNo Gravatar

    @Matthias:

    1) Yes, you are correct, I missed that- I can confirm FaceTime is using the same port for SIP and RTP (UDP 16402 for this session)

    2) On the timing part, the only way I can validate this is by running it again and testing, but basically, according to the call flow above, I answered the call in a little less than 0.1 seconds (10.622 – 10.577) – which is possible if I had my finger on the button, which I think I did, because I was ‘ready to test’. You should not be looking at 180 as the time. Many clients I’ve tested with start the local ringtone the moment you get INVITE, not after you send 180 (it mostly happens in parallel, but sending 180 involves a message encoding and sending out, so it may be sent a few milliseconds after local ringtone starts)

    3) I don’t disagree with your ‘normal is proxy’ comment. That’s a network architecture, not a mandated protocol semantic.

    4) Here is some more info:
    4.1) just before the INVITE comes in, there are TCP exchanges with an apple push server (nslookup says this) over TCP
    4.2) What follows is a whole bunch of UDP exchanges with an IP that does not resolve. But this part looks like it is trying to open a pinhole in the firewall, because just after UDP messages to that apple server, it starts communicating with my router WAN IP – so I guess that seems correct. The final message there is from source port 16402 from my iphone, which is the port that RTP finally flows on. I guess all of this is firewall port punching. (if it helps, each payload of UDP in this exchange is exactly 16 bytes long)
    4.3) Then my client does some DNS queries to find the IP address of a special apple server (by the name, it seems like a server that acts as some form of “registrar” [not SIP] of facetime clients
    4.4) An HTTPs transaction occurs with that server
    4.5) Some HTTP transactions follow with another server – seems to be an HTTP XML exchange compliant to aple’s propertylist DTD schema
    4.6) Then what follows is a certificate setup via TLS with one of their servers

    Once all of this is done…..

    4.7) Then STUN starts
    4.8) Then INVITE arrives from the remote party…

    Now, what I don’t know for sure is the correlation of the INVITE to the “Facetime invite” showing up on my screen and the correlation of the 200 OK to my pressing “Accept” – I can check it later.

  23. encryptionNo Gravatar

    No SRTP? :(

  24. HHFNo Gravatar

    To answer the question, how does Apple map IP to phone number: when the phone is set up, there is communication to registration.ess.apple.com, afterwards in all calls only to invitation.ess.apple.com. Further, the phone sends a SMS to Apple (in Europe via a UK number, you can check your bill) to link phone to number. If you change the SIM card, this happens again. Then, before call set up, the calling phone asks Apple for the IP of the target phone.

  25. [...] Mehr zum Thema bei iConverged. [...]

  26. [...] it should be possible to connect via FaceTime using other methods. iConverged discovered the video connection is done through SIP and STUN and routed through the IP address and [...]

  27. [...] FaceTime: SIP/STUN, + Verschlüsselter Kontakt zu Apple-Servern [...]

  28. [...] bit technical but very interesting you may read to understand how FaceTime work. Really work ;-) iConverged Facetime on Iphone 4: Vanilla unencrypted STUN and SIP Leaked: Apple Stealing All FaceTime Information, AT&T Locks Users via OTA Updates Regards, [...]

  29. [...] Facetime on Iphone 4: Vanilla unencrypted STUN and SIP – roychowdhury.org No hacking needed – just an on the wire black box inspection – its just plain SIP and STUN for firewall discovery. [...]

  30. [...] read this post about a guy who has sniffed the iPhone4’s FaceTime [...]

  31. twNo Gravatar

    how is the handover done exactly? are the CS and Wifi operating at the same time for a few seconds, ie CS audio + Wifi audio on mute, Wifi video running ?

  32. LandNo Gravatar

    As 3GPP has defined the Voice Call Continuity feature to allow voice calls to switch b/t WiFi and CS cellur, and it requires the handset support 3GPP IMS SIP. I guess FT can not support it now.

  33. [...] Here are some quick pictures showing the process from behind the scenes (picture credits): [...]

  34. [...] after FaceTime for iOS was released a few network dumps of FT calls were published online. What is clear from those dumps is that Apple built its own, proprietary peer [...]

  35. Felipe DesiyatnikovNo Gravatar

    Hello would you mind letting me know which webhost you’re using? I’ve loaded your blog in 3 completely different internet browsers and I must say this blog loads a lot quicker then most. Can you suggest a good hosting provider at a fair price? Thanks a lot, I appreciate it!

  36. spencerNo Gravatar

    After such a long time, anyone able to grab video stream from the packet and verified it is valid H.264 stream? It seems to me the video has been encrypted or manipulated so it does not look like a valid video stream.

  37. NigeNo Gravatar

    OK, but apart from all the tech details, if I make a Facetime call to my wife in the UK, while I am away in the USA, it won't use mobile roaming data services, but will be billed as a normal phone call?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>