Thursday, December 28, 2006

Cellular IP-Friendly Apps - Never Block on TCP Sockets

Traditionally programmers have assumed that a TCP packet will either make it to the remote peer error-free or the TCP socket will be detected as failed. However, this has proven a disastrous assumption in the world of cellular networks.

Cellular networks seem to suffer a kind of burst-error mode where whole groups of TCP packets get lost or delayed, while another group makes it through. This seems to confuse the TCP state machines within OS which are optimized for the more rare, single-packet loss of Ethernet. We have Ethereal traces where one can see the application send a TCP packet, the OS retries once, a collect of old stale TCP acknowledgements return from the remote - then nothing. Eleven hours later there have been no more TCP retries, no TCP keepalive, no response from the remote, and no TCP stack error from the OS to abort the application block. The host application is still hung, blocking waiting for either a response or socket failure which never come.

So is this a bug in the OS? Does it matter? It is your application and "our" customer who pays the price. For example, Digi had to go through our RealPort driver and literally add an OS timer to abort every TCP socket call if it did not return in 60 seconds. Yes, this sounds like a royal pain but it was the only way to avoid this failure every few weeks when running across cellular IP.

Recommendation: applications must NEVER block on a socket waiting for a response or socket failure. Applications must always use an OS or external timer to abort socket functions that take longer than 1 minute. Sadly, running in non-blocking mode is NOT enough since at times it will be the API call which fails to return regardless of the block/non-block setting. So even using API calls with explicit timeouts is not safe.

Wednesday, December 20, 2006

IP-Encapsulation of Modbus/RTU

Summary: Modbus/RTU can easily be encapsulated within TCP/IP ... as long as there exists some mechanism to keep full Modbus/RTU messages packed within a single TCP segment or UDP packet.

Most industrial users have learned to be wary of expecting Modbus/RTU to work over error-correcting modems (especially radio) unless you use special modems which are Modbus/RTU aware. So it is wise to be wary of moving Modbus/RTU over IP without some special settings or features in the IP devices involved. Fortunately all Digi devices (and most competitors' devices) have such features or settings.

Bullet-Proof Solution: Modbus Bridges
The safest and most flexible way to move Modbus over IP is to use devices which fully understand the Modbus protocol and dialects of Modbus/TCP, Modbus/RTU, and Modbus/ASCII. This allows multiple Masters to share the slave(s), plus Modbus/TCP masters can query Modbus/RTU slaves and the bridge handles the protocol conversions.
More detailed information of this topic is this application note
Setting up Digi One IAP for Modbus Bridging. The basic information in this application note applies to the following Digi products with Modbus Bridge ability:

Effective Solution: TCP (or UDP) Sockets Profile
If you don't really require the multi-master or protocol bridging features, then any Digi device server can be used. By default the Digi serial "TCP Socket Profile" will break all messages into TCP segments of from 4 to 64 bytes - not what you want for Modbus/RTU. This default behavior creates the lowest latency for normal data without the timing fussiness of Modbus/RTU. However, all you need to do is enable the option checkbox feature "Send data only under any of the following conditions" and then the sub-option "Send after the following number of idle milleseconds". Entering a time such as 25 milliseconds causes the Digi device server to continue collecting data and delays creation of the TCP segment packet until no more serial data is seen for 25 milliseconds. This is a very nice fit to the Modbus/RTU "3.5 character idle time" end-of-message condition. Why not use 5 msec? Well, experience has shown me the 25 (or even 100 msec for cellular) is a more robust value.

So an example solution using TCP Socket Profile would be to use an OPC server such as Kepware which can put most of its serial protocols into a TCP/IP socket. These should naturally put a full Modbus/RTU request into a single TCP segment - the host application is "defective" if it causes more than one TCP segement to be used; it means the host application vendor doesn't know what they are doing. Since the Digi device server receives the entire request as a single TCP segment, the full Modbus/RTU request will move out of the serial port as a single continuous stream of bytes. With the correct settings, when the Modbus/RTU response returns the Digi device server

Friday, December 15, 2006

Mixing Modbus and Rockwell on Ethernet

Both Modbus and PCCC-based protocols like DF1 or CSPv4 (AB/Ethernet) have been around for years. Yet if one looks at the similarities between the two, one quickly sees that the act of reading 10 words from an N7 data file is exactly the same as reading 10 words from Modbus 4x00001. The Digi One IAP leveraged this to become the world's first off-the-shelf transparent protocol bridge. It freely accepts Modbus or Rockwell requests and bridges them to the appropriate form for the slave to understand.

Here is an example system:
 Example of AB and Modbus talking on Ethernet

  • The ControlLogix can poll the Modbus/TCP and DF1 PLC
  • The Modbus/TCP PLC can poll the ControlLogix and DF1 PLC
  • The DF1 PLC can poll the ControlLogix and Modbus/TCP PLC.

So how does this work? Take a look at the messages to read the first 48 bits of bit memory:
  • Modbus/TCP is 001E00000006010100000030
  • Modbus/RTU is 0101000000303C1E
  • Modbus/ASCII is :010100000030CE(CR)(NL)
  • DF1 Full-Duplex is 100201000F000019A206038500001003DE06
  • CSPv4 is 0107000E00 … 010500000F000019A2060385000
  • PCCC-Ethernet/IP is 6F002800 … 0000010000000F000019A2060385000
Notice the bold, underlined text patterns? This is the heart of how a normal Modbus Bridge or 1761NetENI function. Modbus/TCP, Modbus/RTU, and Modbus/ASCII may have different bytes, but they all move the exact same Modbus command; a Modbus bridge doesn't need to understand the Modbus command, just be able to unpack and repack each form. Similarly DF1, CSPv4, and PCCC-in-Ethernet/IP have different bytes, but they all move the same PCCC command; a PCCC bridge doesn't need to understand the PCCC command, just be able to unpack and repack each form.

The Digi One IAP takes this one step further - since each of these bold, underlines commands is accomplishing the same thing - namely reading the first 48-bits of bit memory - the Digi One IAP can take either command and mechanically create the other. So given the Modbus command 010100000030, it can create the PCCC command 0F000019A20603850000. Given the core PCCC command 0F000019A20603850000 it can create the Modbus command 010100000030. So this how a Modbus/TCP master can query a ControlLogix with PCCC-enabled. the Modbus/TCP master thinks it is polling another Modbus device. The ControlLogix thinks it is being polled by another Ethernet/IP device.

Here are links to other related information:

Digi One IAP product page
Application Note for Modbus master polling Rockwell devices.
Excel spreadsheet for Modbus master polling Rockwell devices.
Application Note for Rockwell master polling Modbus devices.
Excel spreadsheet for Rockwell master polling Modbus devices.
PDF presentation of various ways to mix Modbus and Rockwell devices

Rockwell AB PLC via Cellular

So far we have succeeded in getting several Rockwell/Allen-Bradley PLC up on Cellular with the Digi Connect WAN, which is a cellular router for GSM or CDMA with local Ethernet and serial port.

In Summary:
  • Serial DF1: You can access serial MicroLogix PLC such the MicroLogix 1200 on the remote Digi Connect WAN's serial port. You either need to have an OPC server which can directly encapsulate DF1 protocols into TCP/IP or to use Digi RealPort to create redirected COM ports for RSLinx. Ideally, using the newer DF1 Radio Modem protocol can cut your data costs in half, but DF1 Full-Duplex or Half-Duplex can also be used. DH485 won't work via cellular due to the high latency. You must slow the PLC (ACK) timeout setting down to 30 seconds, so you cannot use a MicroLogix 1000 since it doesn't allow this parameter to be adjusted. DF1 Radio Modem has no DF1 (ACK) or (NAK), which is why it costs less to use.
  • CSPv4 or AB/Ethernet: You can access legacy PLC such as SLC5/05 and PLC5E by enabling TCP port forwarding of port 2222 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Ethernet Driver", then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. For a bit of fun, open this link in your browser and you will access the web pages of my SLC5/05 through Cingular/GSM cellular - http://digiwan.gotdns.org:8080/. But please don't leave this page open since you'll impact other people trying to look at my cellular PLC.
  • Ethernet/IP: You can access ControlLogix and other newer PLC supporting Ethernet/IP by enabling TCP port forwarding of port 44818 on the Digi Connect WAN. Under RSLinx you enter the IP or DNS name for your Digi Connect WAN in the "Remote Devices via Linx Gateway" Driver, then right click the driver to slow down the timeouts from default of 3 seconds to a cellular-friendly 30-seconds. You cannot use the RSLinx Ethernet/IP driver since it relies on UDP broadcast which cannot move across wide-area-networks.


If you want more detailed instructions, I have an application note online here:

90000772_A_Cell_AB.pdf

Monday, December 11, 2006

Simulating Multi-drop Across routed IP

Summary: the UDP Sockets profile in Digi device servers can be used to simulate multi-drop behavior in routed IP or wide-area networks. An application note is linked to this entry.

Ever wished your Ethernet could mimic an RS-485 network? Or are you trying to replace an old, expensive multi-point analog modem system with newer IP-based technologies such as cellular, satellite, or aDSL links?

On a local Ethernet subnet a UDP broadcast can be used to simulate multi-drop ... however IT departments and anyone thinking of the future knows IP broadcast is something not to be used lightly. IP broadcast loads every device on the network and examples of high broadcast load killing or crippling important embedded devices are common.

The preferred method on a local Ethernet is the use of Class D IP (aka IP addresses in the 224.x.x.x to 239.x.x.x range). However, details of IP assignment, IP collision, and the risk of turning switches into hubs make this a risky and confusing technology. Most heavy users of ODVA Ethernet/IP can cite a few cases where enabling high multi-cast traffic killed other third-party products (notably security or video devices) which had treated all multicast traffic as broadcast to be examined by software.

Plus we are talking about wide-area-networks and use of cellular or satellite technology. Routed IP networks won't move broadcast or multicast traffic unless active proxies exist at each end to encapsulate the broadcast/multicast traffic into TCP/IP.

Fortunately, the Digi One IAP (and most Digi device servers) include the ability to use a form of repeated UDP/IP unicast to simulate multicast to up to 64 remote peers. I have customers using this to move Modbus/RTU and AB DF1 Half-Duplex through routed private wide-area-network. Here is an application note which explains how to set this up.

(For now it is a Word 2003 document - but it can be opened by Open Office Writer v2.0 if you don't have Word. I'll shortly turn it into a Acrobat PDF)

http://iatips.com/blogimage/90000xxx_A_UDP_Multidrop.doc

Thursday, December 07, 2006

Rockwell Protocol Documents

A friend just pointed out this public web page to me: How to Communicate with Rockwell Automation Products. While have seen many of this documents before, a few of them were new for me. It includes information on:
  • How to talk to ControlLogix tag data via Ethernet/IP
  • How to understand ControlLogix data structure packing when read raw
  • The DF1 serial protocol specification
  • How to encapsulate CIP over DF1 (ie: talk to serial port of Compact/ControlLogix)
  • How to use Ethernet/IP explicit messaging to ControlLogix
  • How to use Ethernet/IP I/O messaging with ControlLogix

In addition, I see www.ab.com has added a new DF1 supplement to its Knowledge Base. Since you have to login giving you a direct link is pointless, but it is called "DF1 Protocol supplement 17706516". It compares PLC5 and SLC5 communications, covers some useful commands such as 0x0AB "Protected Typed Logical Write with Mask" to write individual bits, and new data file types not covered in the latest 1996 version of the DF1 specification.

While we are discussing new Rockwell protocol information, you should also review and be aware of the new "DF1 Radio Modem" protocol. I don't think there is a form specification, but you can find a file in the http://www.ab.com/ Knowledge Base that describes the simple differences between it and DF1 Full-Duplex. In summary, DF1 Radio Modem *IS* DF1 Full-Duplex without the protocol ACK/NAK. It is designed for use in radio systems where the powering up of slave modems just to ACK something they will respond to anyway just slows down overall polling. I'm also finding it ideal for cellular IP networks since it literally cuts your data usage by 50 to 60% to NOT be moving 2-byte DF1 ACKs within TCP/IP which also includes a TCP acknowledgements. SInce DF1 includes a TNS or transaction number, there is no problem with mishandling lost or delayed messages.

The main catch today is that I think neither RSLinx nor ControlLogix support DF1 Radio Modem - it is mainly a MicroLogix and SLC5 family resource. However the next release of the Digi One IAP will include the ability to bridge to DF1 Radio Modem from Ethernet/IP, CSPv4 (AB/Ethernet) and DF1 Full-Duplex.

Wednesday, December 06, 2006

Cellular IP-Friendly Apps - It Costs to Talk

Summary: All communication must be "under control". All data sent into the cellular system costs money; even if the remote cellular device is powered off, the customer still pays for data set to it.

As a follow-on to the discussion of Retrying TCP Socket Opens, applications must allow the user to both understand and limit all aspects of protocol usage and retry. Users must be allowed to limit and predict a reasonable worst case traffic cost. For example, some protocols include large blocks of initial connection negotiation, which means talking once per minute over an continuously open socket could result in much less cost than talking once per 10 minutes over a socket opened just for one transaction. I have seen applications that allow users to set a maximum desired retry setting - then not always follow that setting and do retries anyway in certain fault conditions.

Recommendation: application-writers must step back and examine every place within the application they create traffic and confirm users have the ability to limit the traffic created.

Example and Numbers: now most of you will be saying "Yah, dahh - so obvious why is this even mentioned?". Well, I'll give you an all too typical example of how this affects real customers. A customer (call him Joe) running a pilot on cellular data access calls to complain his costs are higher than expected. He says he's just polling 3 Modbus registers every 5 minutes. Being no dummy, Joe has already calculated that each request should be 12 bytes of data (One Modbus/TCP function 3 read) and each response should be 17 bytes of data (One Modbus/TCP response with 4 registers since he is reading 4x00003, 4x00004 and 4x00006 so one assumes 4x00005 comes along for the ride). One poll each 5 minutes works out to be 8640 poll per 30 days, so he had hoped to see only about a quarter-megabyte of traffic a month. Yet Joe was seeing data bill for 6 to 10 MB of traffic a month. This means his $20 per month 5MB plan was costing him closer to $60 per month with data overages.

First, Joe overlooked the fact that he has to pay for not only his Modbus data, but also the TCP and IP overhead used to move it. Standard Windows-generated TCP headers are 20 bytes and so are the IP headers. Linux tends to defaults to use TCP time-stamps and thus creates 28-byte TCP headers. So each request is NOT 12 bytes, but 52-60 bytes ... plus the TCP Acknowledge frame will add an additional 40-48 bytes. Yes, YOU pay for the TCP Acknowledgements as well! With headers and TCP Acknowledgement, his responses will be 97-113 bytes not 17. So right off the bat, I can see that he has been under-estimating his monthly traffic. Since he is using Windows, he should be seeing at least 1.6MB of traffic and never 0.25MB.

So I vist Joe and do a network trace of his OPC server traffic. We see that OPC is issuing 3 Modbus polls every 5 minutes - not 1. Hmmm, of course Joe's first reaction is "Heck no - I'm not polling 3 - just 1" but the proof is there as colored pixels on my notebook display. We decode the polls and see the OPC server is polling 3 blocks of 32-registers each. After decoding the Modbus/TCP bytes we learn the exact registers being polled and Joe eventually discovers why these are being polled:

  • One block of 32 registers is fetching his 3 desired value of 4x00003, 4x00004 and 4x00006. Reading the fine print in the OPC manual we see that the OPC server decided this was a "scattered poll" of 2 separate memory areas so it bumped the size up to 32 registers. So just for this one poll, his monthly budget is up to 2.8MB instead of 0.25MB
  • A second block was caused by Joe programming an HMI display to pop-up if a certain alarm condition where true in the field. This was a demo he'd done to impress a customer, but Joe hadn't thought to disable it nor had realized the exact "cost" of such a feature. So the OPC server needs a single register off somewhere else in the PLC memory to satisfy the HMI's alarm/event function. We don't know why this is polled as 32 registers instead of 1 - it is not a "scattered poll" as defined by the OPC vendor's documentation. Perhaps his HMI or OPC server software has a bug in it. Since this is Modbus/TCP (not serial) it is unlikely anyone else has noticed or cared that the application is moving 62 bytes of extra data in every poll. After all, Ethernet is fast and costs nothing to use. It is possible the programmers at the OPC vendor just decided there was no reason to ever poll less than 32 registers when using fast, free "Ethernet".
  • The final block was being caused by Joe's boss leaving open an HMI display in another room that wasn't supposed to be left open - human error (or is it?). Joe learned instanty how important it was for him to properly configure the HMI display settings which timed out displays - either closing the window or just stopping the supporting data polls. He had done that for the normal "user display", but had been lazy and not put such settings into the various diagnostic displays users weren't expected to use!
So now his 6 to 10MB of traffic a month begins to make sense. Each distinct poll is creating nearly 3MB of traffic per month, and his traffic is influenced by which HMI displays users open. Multiple 3MB by 3 polls and you roughly 9MB per month.

What has been learned here?

  • With the overhead of TCP/IP, Joe learned that he had to pay for over 4 times more traffic than his raw Modbus byte calculates had led him to believe.
  • Joe learned that he should be looking at using UDP/IP instead of TCP/IP for his Modbus/TCP since this would cut 40-60% off his bill instantly. Modbus doesn't really require the TCP Acknowledgement and my own tests of UDP/IP over cellular shows it to be about 99.99% reliable - or put another way, I only see about 1 packet lost per 10,000 sent.
  • Joe learned how to review his OPC server's data statistics page. His OPC server had been (indirectly) giving him the answer as to why his data usage was so high. While his OPC server never totaled up the data bytes to include TCP/IP overhead, it was able to show him the 36 polls per hour he was moving instead of his expected 12 (one per 5 minutes).
  • Joe learned that perhaps he needs to look for a new OPC supplier, since his present vendor just doesn't seem to see the big picture of IP-enabled protocols; that Ethernet is not the only media using TCP/IP. Increasingly people expect TCP/IP to move through diverse media which is not always "fast and free" like Ethernet. Joe's present OPC supplier didn't give him the ability to reduce the poll block size below 32 registers when the OPC system thought "Ethernet" was being used.
  • Joe learned he had to be more aggressive in his HMI display design. He couldn't assume users would only look at certain displays and not leave open displays unexpectedly. Joe needed to actively set every possible display to automatically close or stop generating new polls. In fact, after review he discovered that most of his displays had no need for "real-time" update and he could just set them to display the data once as read without any refresh. Users always had to the option to manually redisplay the page.
  • Joe learned that maybe just reading data from the RTU program directly was not such a wise idea. His RTU had the ability to copy and repack data into special polling areas to eliminate "scattered polls". In fact, in the above example we traced at Joe's site, all of the data in those 3 polls could have easily fit within a single 13 register block. So Joe is reviewing his RTU program design to repack ALL data of interest - even data supporting rare HMI displays - into a dedicated memory area. While Joe had previously hoped to avoid this work, he now sees the potential dollar saving or cost penalty his company could face if he avoided this work.

So really in summary I have to say Your data polling needs to be UNDER CONTROL, as in being controlled. You need both the tools and the investment in effort to define as exactly as possible each and every data poll.