Friday, February 23, 2007

Modbus Report-By-Exception over Cellular IP

Summary: While traditionally serial Modbus has been considered unable to use Report-by-Exception, when combined with IP networks Modbus Report-by-Exception becomes very natural and effective.

Modbus/TCP is inherently peer-to-peer
People using Modbus/TCP over Ethernet or IP are familiar with its ability to function as peer-to-peer. Most PLC with Ethernet ports can function concurrently as a Modbus/TCP slave and Modbus/TCP master. So 2 PLC can very easily connect - with 2 separate Master-Slave TCP connections - and share information. One TCP connection is a Master/Slave connection with the first PLC as Master and second PLC as Slave. The other TCP connection is a Master/Slave connection with the first PLC as Slave and second PLC as Master.

Technically, this is not Report-By-Exception in the true sense of a protocol. However, since the PLC-as-Master communication events can be triggered by field inputs, it has the same effect as writing information only upon exception or when change is relevant.

Modbus/RTU as peer-to-peer
Serial Modbus/RTU is a bit harder to use this peer-to-peer trick with. A device with 2 serial ports can of course have 1 port configured as Master to issue remote reads and writes, while the 2nd port is configured as Slave to answer requests. When connected to a 2 serial port Modbus IP to Serial Bridge (such as the Digi One IAP), the 2-port serial RTU becomes a full Modbus/TCP peer, capable of operating fully peer-to-peer with other PLC and SCADA/OPC applications.

However, vendor's aren't blind to the marketing aspect of "more hardware". While adding a 2nd serial port to a one-port RTU may only cost a few dollars, most likely the 2-port RTU is a much more powerful unit, so the actual end-user cost may go up hundreds of dollars. The same is true of Ethernet; while adding an Ethernet port may only cost a few dollars, user's expectations of Web Pages and fancy functions means the Ethernet-enhanced device price may be $500 or more above that of a simple 1 serial port RTU

Fortunately, the Digi One IAP (as well as PortServer family) allow Modbus/RTU slaves to use Report-By-Exception on the serial port. As long as only one serial slave is on each port, the Digi uses configured knowledge to "split" the single serial port conversation into two traditional Modbus/TCP connections. So traditional remote Modbus/TCP masters can query the serial slave RTU, completely unaware that on occasion the serial slave RTU wakes up a acts as a Master to write data during Exceptions. Somewhere, a traditional remote Modbus/TCP slave will receive Modbus/TCP messages from the serial RTU slave, completely unaware that when not busy reporting exceptions, the remote "Master" is really a passive Modbus/RTU slave.

This feature is ideal in wide-area-network situations were bandwidth is limited or data traffic is billed on volume of bytes moved. For example, many SCADA systems only need to check on remote status every few hours ... for example lift pumps in a storm sewer system do absolutely nothing interesting for weeks or even months in the absence of rain. Even during a normal rain, checking on them every few hours is likely enough ... that is *IF* the remote life pumps can send Report-By-Exception messages during system problems.

For example, we have one customer piloting use of Modbus Report-By-Exception over cellular data network. Their eventual target is to poll the remote sites once per day. They use a simple, single-port Modbus/RTU slave which combines I/O with an LCD and push buttons to make a simple, self contained "RTU" or Remote-Terminal-Unit in the truest sense of the word. Use of Modbus in UDP/IP and Report-By-Exception allows this customer to plan for $12 per month per site bills. If forced to poll continuously with Modbus over TCP/IP, they would need to pay $50 or more per site per month. With hundreds or thousands of sites, that is a huge cost savings and opportunity for better ROI (return on investment).

More Information
Here is a general discussion of how to design Modbus/RTU serial slaves and masters to gracefully handle Report-By-Exception:
Here is a more focused discussion of the Digi One IAP and PortServer TS1, TS2, TS4, TS8, and TS16 handle serial Modbus/RTU Report-By-Exception

Friday, February 09, 2007

Cellular IP-Friendly Apps - Response Delays

Back to my series of entries on creating graceful IP apps

Many newly written Ethernet-enabled applications incorrectly equate "Ethernet = Fast". They overlook that Ethernet is often just a path into other slower IP-based networks. Worse, some well meaning programmers set the response default to 250 milliseconds and limit the user configuration to a maximum of 5 seconds - I'd say so far about 20% of the applications I've had to help customers will limit Ethernet timeouts to 5 seconds or less.

But cellular networks have a high end-to-end latency - especially if the line has been idle for a few minutes. Normal slave response times will be near 2 seconds with round trip delays up between 10 to 12 seconds common each day (see my entry on real world Modbus numbers). Interestingly enough, every cellular "expert" I talk to keeps correcting me that cellular latencies are in the 50 to 100msec range and getting better every new "gen". Well, I guess my Saturn ION can do 400 miles-per-hour also ... if you drop it out of an airplane! Well, regardless of what these "experts" are smoking, my simple tests show otherwise where it really counts ... in actual real world tests run over the Internet to cellular-based IP devices.

Recommendation: IP applications should default to a 3 second response timeout. Applications must allow users to configure this timeout to be lower (perhaps to 250msec) and also higher to at least 60 seconds.

Impact: On Ethernet this should have no direct consequences since the timeout only has affect if the remote is no longer available - in which case the remote is going 'offline' anyway. The minority of users who really want a 250 millisecond timeout can set it manually, while cellular users who want a more reasonable timeout of 15 seconds can also set it also.

For cellular networks, the real problem with premature timeout is the customer has already paid for the request and very likely will also pay for the response - even if the response comes after the application gave up on the response and did a request retry. Assuming the user is polling the remote at a moderate pace to control costs, there is no harm is waiting longer for the response to maximize the value of the traffic paid for.

Another simple example is an application that sends a request, then timeouts twice and retries twice. How will the application react when it receives three responses at the same time? Remember, the first two requests probably were not lost; they still likely reached the remote device and created responses. Their responses may have been just delayed longer than expected. Since serial Modbus doesn't include enough information in a response to match it up to a request, this can cause serious misoperation of the system. Protocols including a sequence number should handle this more gracefully, but it will still be a waste of money.

We have also seen protocols which treat unexpected responses as a reason to abort and reset the communication channel, which further adds to cost. For example, we had one super headache with a big-name seller of "energy curtailment" systems. The end user insisted a 5 second timeout was the maximum they could tolerate (ie: wishful thinking - set a 5 second timeout regardless of reality). So lets just see what happens when we hit one of the rare but expected latencies over 10 seconds.
  1. SCADA software sends out request sequence 74
  2. 5 seconds later, SCADA times out 74 and sends out 75
  3. 5 seconds later, SCADA times out 75 and sends out 76
  4. 1 second later - since TCP/IP is reliable - all three responses return
  5. SCADA is expecting response 76, but sees 74 ... Oh, big problem ... need to reset comm subsystem
  6. SCADA sends reset to remote RTU, expects response 1 but ... da da ... sees response 75 since they never flushed the old info and TCP/IP is reliable.
  7. SCADA sends a 2nd reset to remote RTU, expects response 1 but sees response 76 since they never flushed the old info and TCP/IP is reliable
  8. At this point, I hope you see that there are still 2 responses to the comms reset in the receive queue!

Anyway, whenever this reset "temper-tantrum" occurred it would take 10 to 15 minutes to get the connection back up. Of course one problem was the stupid customer unwilling to set the correct timeout, but the SCADA software was defective since it wasn't smart enough to just discard old responses with timed out sequence numbers. In the above example, life would have been fine and dandy had the SCADA system just discarded responses 74 and 75 since it expected 76.

Wednesday, February 07, 2007

Cellular and DNP3

I was just at the Distributech show earlier this week - the show for power utilities. Lots of interest in cellular access. I know both OSI and Itron have successfully tested their software against our cellular product.

I have a DNP3 RTU up on my public cellular device, but need to confirm details of how the public can access it. I also had a discussion with the primary provider of DNP3 source code in the world and we will be looking at putting a DNP3 slave simulator up via cellular. It would be really userful if this could expose some of the statistical & diagnostic info managed by the simulator. This would help software vendors fine tune their software to handle the variable latency of cellular.

Thursday, February 01, 2007

Do Users Really Want Industrial Ethernet?

(For those impatent to read this to the end - I'm not saying don't use Ethernet ... I am just saying be careful you understand what your customers expect and what functionality they will assume you include *for free* when you add Ethernet)

My last post created some interesting feedback. But I want to emphasize a topic from that post more fully. For the last 15 years I've been involved in the "multi-vendor interface" business - linking multiple vendors' equipment by data comms. First I worked in RS-232 and 485, then fiber optics, then Ethernet, and now by virtually every technology that moves TCP/IP.

From time to time I am contacted by some pretty desperate customers - for example I had one customer who had piloted some Ethernet-based temperature sensors. Things worked fine in the lab with their lab computer, so they bought 50 ... only to find out they couldn't use them. It seems these sensors really were "just Ethernet" - they talked by Ethernet broadcast and direct MAC-layer packets. They didn't support TCP/IP and therefore could NOT be routed by any standard network infrastructure. The user could not talk to any of the sensors they had intended to install in panels around the plant because the "Computer Room" wasn't on the same physical Ethernet segment as the "floor". There was no way to broadcast or unicast MAC-level between the systems. This customer hoped I knew of some magic box to act as gateway between TCP/IP nodes and pure Ethernet nodes; I didn't.

So this brings me back to the concept of the true cost to implement "Ethernet". Customers who ask for Ethernet are not really asking for Ethernet hardware or an Ethernet media bus. They have the expectation that they can interface your "Ethernet Devices" with the wide variety of other equipment they have - including WiFi, routed Ethernet, fiber optics, wide-area networks, and so on. They also expect (at least in a future firmware rev) web pages for configuration, SNMP for remote management, strong encryption, and so on.

So the term "Ethernet" has taken on a life of its own - remember when 802.11 was called "Wireless Ethernet". Well, there is absolutely NOTHING Ethernet about 802.11, yet it was a useful PR move to link the two. No doubt it helped spread the acceptance of WiFi as we now call it. Interestingly enough, the current PCI verse PCI-Express adapters you buy for a PC are using the same PR trick - linking a new, unknown technology to an old established technology that merely accomplish the same function by very different means. Maybe Sony should have called Beta-Max VHS-Max instead ... but then I'm showing my age by even knowing that a consumer-oriented video standard other than VHS even existed.

But back to Ethernet. If you are a small device maker and have yet to start making Ethernet-based products, just be aware that customers who ask for "Ethernet products" aren't really asking for ... err, products with Ethernet. They are asking for products which integrate into (at a minimum) the wide family of TCP/IP based technologies out there. I am not even talking about should you support Modbus/TCP or ODVA Ethernet/IP or ProfiNet yet. I am just saying customers will expect your "Ethernet products" to be able to hold a raw TCP/IP or UDP/IP conversation with all of the other equipment they are investing in daily.

So the cost to add an Ethernet port is just a small part of your cost to "add Ethernet". That is why companies like Digi can sell Ethernet-to-Serial converters or sell "async Ethernet driver chips" like the Digi Connect ME which links to your CPU's serial UART. These devices of course cost more than $9.95 or even the cost of a few new hardware chips, but that higher cost is paying for TCP/IP, web servers, SNMP servers, strong AES encryption and all of the other things your customers expect when the buy "Ethernet products".

So to digress a bit, I suffer this "Oh, don't worry ... it's Ethernet" on a daily basis. So far I have to say at least 95% of the off-the-shelf software applications I test supporting TCP/IP don't work well with technologies other than direct Ethernet. This includes problems not only when extremely different media like satellite or cellular, but even when WiFi is used. So that is part of my mission in this blog - what you want is NOT to Ethernet-enable your products. Instead you need to "IP-enable" your products by way of an Ethernet interface.