Sunday, July 29, 2012

Holding open TCP sockets on cellular

A reader asked me to offer suggestions for keeping sockets open long-term - for example, you might open a connection to a status/event ASCII printer port and then on the host collect events.

First, let me say up front that it will be impossible to ever see such a cellular-based TCP socket remain open 'forever', which in this context is say a year or more.  Cellular just does not work that way.

Although not a scientific response, after my 6+ years of work with cellular, I will guest-imate that the longest connection I have ever seen is probably about 20 days, and that the most common pattern is to be connected a few days (2 to 5), then a day with several disconnects and reconnects.  If your application cannot function in this scenario, then do NOT use cellular.

Things which hinder long-term socket opens:
  • Environmental dynamics: for example a rain storm will interfere and drop signal quality.
  • Load Sharing: the bottom line for cellular carriers is always "Make the most users as happy as possible", with a bias towards humans with smart-phones.  This causes:
    • All users share bandwidth, so the more users are active, the less throughput and 'service' any of them receive.
    • Since most human interaction is fast bursts of one-shot, always give priority to new sockets.  The longer your 'session' exists (aka: your TCP socket is open), the more likely it is to be dropped during high tower loads.
  • Time-Of-Day issues: many cell towers literally shift (or point) in different directions during the day.  For example, when highway traffic is light they may point out over a residential or industrial area, but during heavy traffic they shift towards the roads.  Seems weird, but the goal is to maximize the bandwidth usage, so the tower always tries to point to the most active users.
  • The 5M-KOD (5-Minute Kiss- Of-Death):  All idle TCP sockets should be assumed to vanish after being idle for 5 minutes.  You should always use TCP keepalive set to about 4 minutes, 50 seconds.
Cost considerations:
  • Most carriers now round up per socket, per hour.  Why do they do this?  Because they can!  And because they bill in somthing called 'units', which are 1K blocks.  So if you look at your bill you'll see that a certain session started at for example 8:12AM, ended at 1PM and includes 47 units of data.
  • This means if you hold 1 socket open for 1 hour, you will suffer at most a 1K data round-up per hour (or 24K per day).
  • This means if you instead open a new socket every 5 minutes, then you open 12 'sessions' per hour and you will suffer at most a 12K data round-up per hour (or 288K per day).
  • In both cases above I use the 'at most' phrase and do not attempt to average anything.  if your data is random, then fine - worst case is likely 1/2 of what I state, but unfortunately most industrial systems are moving predictable data each time.  So if you just happen to move 1 byte 'too little' each hour, then you'd get a wonderful 1 byte round-up - but if your standard size makes you 1 byte too large, then your roundup is always 999 bytes per hour.
  • Opening/Closing a socket includes from 500 to 1000 bytes of overhead.
  • A TCP keepalive includes about 100 bytes over overhead, however MOST TCP stacks only send the keepalive if the line is idle.
  • Never poll at a once-per-5 minute rate!  If you really want once per 5 minutes, you really need to poll at for example once per 4 minutes and 45 seconds.  You must stay away from the 5M-KOD like the plague, or you risk all kinds of weird cell network issues.  Again, I will be unscientific, but state that since many vendors are involved in your link, you have a huge race condition at 5 minutes where part of the link is already down and part is still up for the moment.  And since the cellular system is NOT IP based, this increases you probability that the cellular link appears up to your cell module, yet is really broken & unusable.
 Suggestions to create a better TCP socket experience:
  • Use UDP/IP when ever you can!  It has been shown in my own tests to cot up to 85-90% of your data costs!
  • Accept that you cannot keep the TCP socket open forever.  If you wish to (for example) have a Dynamic IP and manually program the host to use an IP which changes each time the cell link drops - this is just not feasible!
  • Ask, do you really need a socket open all the time?  My own tests have show that if you expect data less than once per 30 minutes you are probably better off to open/close the socket between uses.  The trade-off is the data cost to close/reopen the socket (plus the 'session round-up') verse the TCP keepalive required faster than once per 5 minutes to keep the socket healthy.