Friday, October 05, 2007
I'm happy to report that after logging such attempts for many months my cellular devices receive less than 4 probes in any one day and likely under 20 total per month. So many days pass with no probes at all and it appears to stay under 2-3K per month. This compares to perhaps 200 probes per day on my DSL router/firewall.
I am not sure why the difference, although I'd guess it has to do with the high initial latency in contacting a cellular IP. So any "script-kiddie" tool scanning IP address ranges probably is not willing to wait up to 5 seconds for cellular devices on busy towers which need "unparking" to respond.
What kinds of probes are they? Mainly those looking for MS-SQL servers, with a rare access to FTP and the remainder of accesses aimed to seemingly random, unnamed ports - likely associated with trojans or zombie networks.
Wednesday, August 22, 2007
copies of headers etc.
So I have been working on "reduction" solutions - how to obtain the effect of moving "X" IP packets but only moving "X-minus-a-bunch" of actual IP packets.
Tunneling TCP thru UDP
The most promising and generic form of reduction is to tunnel TCP/IP via UDP/IP over cellular. So the host application talks TCP/IP to a local proxy, which acts as the TCP end-point. All of the TCP SYN, ACK and Keepalive traffic is limited to the local Ethernet. The local proxy then initiates a UDP "session" with a remote proxy over cellular & we instantly see a 60-90% reduction in data costs. The remote proxy initiates a TCP/IP connection to the remote Ethernet device, which again isolated the extra TCP overhead to the remote Ethernet.
The reaction of non-IA network engineers to this idea is predictable and a bit humorous after a while. They immediately say "You cannot do that!!! UDP/IP is unreliable!!! You'll break something!!! You are committing a mortal Sin!!!" But in reality none of the IA protocols leverage the reliability of TCP anyway. For example, Rockwell RSLogix doesn't send a program block to a ControlLogix and blindly assume it was successful after the TCP Acknowledge from the peer is processed. Instead, RSLogix sits (blocks literally) and waits for a successful CIP response on a single CIP Connection. So if the local proxy returns a TCP-ACK to the RSLogix host and the CIP request is lost within the UDP/IP tunnel ... eventually RSLogix times out the CIP connection and the application (and/or user) will restart.
Fortunately, cellular is very reliable - all of my tests sending 10,000 UDP packets rarely even lost 1 packet and I'm not sure if such a rare loss is due to cellular or just my test script hiccupping & dropping a packet. Plus cellular tends to have only very bursty error problems. In other words, you won't lose 1 packet per 10,000; instead you'll lose all packets for 5 minutes or just 5 random packets out of a group of 10 sent. This shotgun-damage tends to confuse TCP/IP state machines to the point that they abort the connection anyway. In truth, in all of my Wireshark/Ethereal trace reviewing I have never seen a single situation where a TCP retry did anything but add data cost; every TCP retransmission just results in a "Duplicate ACK" showing up a few packets below in the trace & a doubling of the cost of that block of data.
So overall, anyone planning to use cellular should first investigate if they can use UDP/IP instead of TCP/IP.
TCP Problem #1 - added cost for pointless ACK
As mentioned above, real-world analysis of telemetry use of TCP shows the TCP ACK isn't useful; but worse, Embedded TCP devices tend to sub-optimize the ACK timing to "speed up" data transmission and recovery. Almost universally moving an IA protocol via TCP/IP results in 4 TCP packets instead of the idealized 3.
- Your app sends a TCP request (request data size + 40-52 bytes of overhead)
- 800-1100 msec later your app receives a TCP ACK without data (another 40-52 bytes of overhead)
- 10-40 msec later your app receives the protocol response (response data size + 40-52 bytes of overhead)
- Within a few msec, your app sends the TCP ACK without data.
So what would have been a 2-packet transaction with only 56 bytes of overhead under UDP/IP, or what should have been a 3-packet transaction with 120-156 bytes of overhead under ideal TCP/IP usually becomes a 4-packet transaction with 160-208 bytes of overhead. Yes, there exists a TCP socket option and the concept of "Delayed ACK Timer" to prevent the first empty TCP ack from being returned over cellular, but few embedded products use this since it adds code complexity, and it slows down overall data communications. At least in the IA world it seems everyone wants their Ethernet Product costing 2-4 times more than their serial product to appear lightning fast. So they ignore the TCP community's decades of hard-earned experience and "hack" their TCP stack to sub-optimize fast local Ethernet performance.
So this is where the instant 60-90% data cost savings of using UDP over TCP comes from. UDP has smaller headers and results in fewer packets being sent. Since the cellular IP system is "encapsulating" your TCP/IP packets in a manner similar to PPP, the entire IP header, TCP or UDP header, and your data is all considered billable payload.
There is also a myth propagated to this day that the TCP ack causes retry to occur more rapidly out in the wide-area-network infrastructure. The rhetoric goes, "If the 3rd and 4th router link is congested and the TCP data packet is lost, then the 3rd router will retransmit ... which is faster ..." Perhaps this was true back in the 1980's, but today the 3rd and 4th router (and all of the other 20 to 30 routers in a cellular end-to-end path) are just tossing IP packets upstream with no awareness of the packet functions. In reality, it is only the TCP state machines within your host machine and within your remote device that have any ability to retransmit anything.
TCP Problem #2 - added cost for premature retry
The TCP RFC includes many dynamic timers that automatically adjust themselves based on real-world performance. This is actually pretty neat. It means if the TCP ACK and response times tend to be longer than normal, then the TCP state-machine slowly increases the delay before retransmission. But I've seen 3 problems with this.
- The most effective way to leverage auto-adjust is to include the 12-byte TCP header options that time-stamps all packets. Linux system add this by default and installing one of many PLC engineering tools on your Windows computers causes Windows to also start always using this. The setting generally is global - you either have 40-byte TCP headers or 52-byte TCP headers forever. So for small telemetry packets, this adds a disproportionately large increase in data costs.
- Many embedded devices (PLC, RTU and I/O devices) have "hacked" the TCP ACK sub-system to force connection failure to be faster than the standard 3-4 minutes. For example, I worked with one large PLC company which expected TCP sockets failure in less than 1 second, so they forced TCP retransmission in hundreds of msec and without any normal exponential backoff between retries. This is totally unusable over cellular; you will end up with 30% to 90% of your data traffic being premature retries and responses to premature retries. I have literally seen Wireshark/Ethereal traces which are mainly black lines with red text - which is the default color used to show TCP "problems" such as lost-frag, retrans, dup-ack, etc.
- The latency in cellular is abnormal by an order of magnitude. Even browsing the internet or doing a telemetry polling test over DSL/cable broadband averages latencies in the 100-150 msec range. This is what a Windows or Linux defines as "slow/bad" - not the 800 to 3500msec of cellular. So even watching a Windows or Linux TCP state machine auto-adjust the retransmission delay over time, you will not see it achieve a 100% effective setting which eliminates wasted TCP retransmissions. The delay seems to top out
at about 1.5 to 1.8 seconds, which is just too close to the actual "normal" latency range. So again, use of UDP/IP frees the use user from data costs associated with TCP legacy assumptions - both the main-stream MIS/IT market variety of assumptions and the misapplied IA vendors "speed-ups".
TCP Problem #3 - uncontrollable SYN/Socket Opens
Given the way all cellular systems "park" inactive cellular data devices, it is exceedingly rare to ever see a host app open a new TCP socket without prematurely retrying/retransmitting the SYN packet. This is because one is virtually guaranteed that it will take about 2.5 seconds for the data device to be given active airwave resources and return the SYN+ACK response. This has NOTHING to do with the "always connected" feature Digi and others claim. The data device (even when parked) is fully connected by IP and fully authenticated by the system - it is "always connected". However, the local cell tower only has finite airwave resources, so any device (cell phone or data device) which is idle from 3 to 45 seconds is "parked" without having any preallocated airwave resources. Literally when the TCP "SYN" shows up, the cell tower has to use the control channel to inform the data device to request airwave resources, and after these are requested and allocated the data device can receive and response to the TCP socket open request.
But that's not the real problem related to TCP Socket Opens ... the real problem is yet another case of IA vendors sub-optimizing TCP behavior for fast local Ethernet performance. For example, I once had a customer who normally paid about $40 per month receive a $2000 bill one month. It turns out they had powered down the remote site for 3+ days and the off-the-shelf 3rd-party host application they used would try to reopen the TCP socket every 5 seconds!!! So Windows would send the initial TCP SYN to start the open, since the remote was off-line Windows would retransmit this TCP SYN a few seconds later. After a total of 5 seconds, the application would ABORT this TCP socket attempt and start a new one. So this host app was pushing 24 billable TCP packets per minute out to a remote site that was powered down. This was nothing the host app vendor documented, nor was it anything a user could configure or over-ride. The user could configure the host app to ONLY poll once per 5 minutes; but the user had no control over this run-away TCP SYN/Open behavior.
Tunneling TCP through UDP effectively decouples the TCP SYN/Open from cellular data charges. The first TCP Syn/Open request to the local proxy would succeed even if the remote IP site is offline. No retries would be required. Even if the host app attempts to retry the data poll every 5 seconds, this is something the UDP proxy can be configured to "resist". If the user truly wants data packets to only move every few minutes, that is something the UDP proxy can easily enforce.
TCP Problem #4 - sub-optimized TCP keepalive
The final problem I'll discuss (but not by any means the "last" problem with TCP) is that many embedded IA devices have relatively fast TCP Keepalives hard-coded to speed up lost-socket detection. While this is an admirable goal, a Rockwell PLC sending out a TCP Keepalive at a fixed 45 second interval can create up to 6MB of monthly traffic by doing this. Siemens S7 PLC seem to issue TCP keepalive every 60 seconds - a bit better, but not by much. Maybe such a heart-beat is useful to know the remote is accessible, but given the reliability of cell phones (when the last time you had a dropped call or no signal ...) you'll obtain a lot of false-alarms if you treat every missed packets as something requiring maintenance's attention.
Again, tunneling TCP through UDP effectively eliminates the automatic, possibly uncontrollable use of TCP Keepalive. If your process can handle you talking to it once an hour, then the cost of TCP socket open and close, as well as any TCP Keepalive is all wasted investment.
Not only this, but the cellular providers do NOT want users who send a simple, rather empty packet every 30 to 60 seconds - this is literally the worst kind of customer, as this forces the cell tower to "waste" one of its very limited airwave resources with almost no income returned to the carrier. From what I hear, carriers either want customers who talk constantly and pay huge monthly fees (say $90 to $350/month); or they want customers who rare talk and pay a small fee (even just $5/mo) but cost the carrier virtually no direct expenses.
Putting this is "restaurant terms":
- A cellular data device that talks constantly but pays for a large plan is like a restaurant patron who sits at a table, constantly ordering more food and paying a larger bill.
- A cellular data device that rarely talks is like the restaurant patron who comes in once a month, sits at a table, orders a meal, pays and then vacates the table.
- A cellular data device that keeps an idle channel open full time but rarely talks is like the restuarant patron who sits at a table in the resturant, reading the paper but rarely ordering food or paying a bill.
In fact, in private chats with carrier account people, I have heard several times that they have been directly to prefer either customers who talk constantly on large plans or those who talk at most once an hour (better once a day) on small plans. Customers planning to talk every few minutes have been defined as bad investments. It may be fair to say that after years of building up the data-plan customer base, the cellular carriers have come to understand that the REAL cost of data plans is not the bulk data bytes moved; it is instead the percentage of time the device consumes (or squats on) 1-of-N scare airwave resources in proportion to the monthly fee they pay.
Friday, July 06, 2007
However, the big problem with their "report-by-expection" is that the PLC holds the TCP/IP socket open 24/7 and sends a TCP Keepalive every 1 minute. Sending a TCP Keepalive (2 x TCP packet headers with no data) every 60 seconds costs about 3.4MB per month! I have a hard time seeing that as "low cost". They would be better to just poll real data every 4 minutes and eliminate the TCP keepalives!
Hopefully there is a setting in the PLC to change this behavior, which is why I need to look into this more. I just want to point out that just because a system claims to only move data during exceptions ... that does NOT mean it has very low cellular data costs. Don't forget that data costs include things you don't normally see - TCP socket open/close, TCP keepalive, TCP ack, and TCP retransmissions are all things you pay for.
Tuesday, June 12, 2007
The Convoluted Path of Wide-Area-Networks:
In general the magic of IP hides reality from us all. We tend to think "now I am browsing Google.com or iatips.com", but we don't really understand how COMPLEX and MIRACULOUS this really is. Your computer is NOT connected to either of these web servers; instead your computer uses the services of a dozen or more other computers/routers to get from "here" to "there". Every single data byte must be forwarded hop-by-hop through all of these cooperative peers.
As example, here is a Trace Route (tracert) of access from a computer within my test lab to a ControlLogix PLC sitting six (6) feet away. I am using public Internet access via a cellular Digi Connect WAN to the Ethernet (ENB) of the ControlLogix. Some of the public IP have "X" entered replacing the digits; you don't need to really know the exact IP value.
My computer has private IP = 10.9.92.1
01 01 ms 10.9.1.1 (Digi's private Intranet)
02 01 ms 10.10.11.10 (Digi's private Intranet)
03 01 ms 10.254.254.2 (Digi's private Intranet)
04 16 ms 66.77.x.x (Digi Co-Host/Internet Link)
05 04 ms 69.8.x.x (Digi Co-Host/Internet Link)
06 64 ms 66.77.x.x (Digi Co-Host/Internet Link)
07 09 ms min-core-02.inet.qwest.net [220.127.116.11]
08 11 ms cer-core-02.inet.qwest.net [18.104.22.168]
09 12 ms cer-brdr-01.inet.qwest.net [22.214.171.124]
10 39 ms qwest-gw.cgcil.ip.att.net [126.96.36.199]
11 35 ms tbr2.cgcil.ip.att.net [188.8.131.52]
12 35 ms tbr2.sl9mo.ip.att.net [184.108.40.206]
13 75 ms tbr2.attga.ip.att.net [220.127.116.11]
14 31 ms 18.104.22.168
15 34 ms 22.214.171.124
16 * Request timed out. (Part of Cellular Infra-Structure)
17 * Request timed out.
18 * Request timed out.
19 * Request timed out.
20 1276 ms mobile-166-XXX-XXX-XXX.mycingular.net [166.XXX.XXX.XXX]
Digi Connect WAN has private local IP = 192.168.196.80 (is 'gateway')
ControlLogix PLC has private local IP = 192.168.196.21
These traces always amaze me - how something so seemingly trivial takes so much effort to really function. Notice how my lab PC has to route through 6 devices to even get out of Digi's company network, then through Qwest (our ISP), through AT&T (my cellular SIM provider), through some unnamed hops of the cell system, and finally be port forwarded to the ControlLogix PLC. The packets may be passing through Minneapolis, Chicago, Detroit, Atlanta, and then finally returning to the PLC sitting right beside me.
Effect of NAT (Network Address Translation)
Now lets look at what happens when RSLinx on my PC opens an ODVA Ethernet/IP socket to the ControlLogix PLC. Every TCP/IP packet requires 4 unique values which define a connection:
- Destination IP (target device)
- Destination Port (target application within device)
- Source IP (return address to originator)
- Source Port (likely random port, originator is waiting for responses here)
So we start out with the 4-tuple DST=166.x.x.x : 44818 and SRC=10.9.92.1 : 22256. The 166.x.x.x IP is assigned by my cellular carrier. Port 44818 is ODVA's "well-known" port for Ethernet/IP. 10.9.92.1 is an internal Digi selected private IP. TCP port 22256 is the ephemeral (or random) port selected by RSLinx to listen for responses.
The first NAT effect is the Digi corporate firewall changes the request to be DST=166.x.x.x : 44818 and SRC=66.77.x.x : 22256. My private IP of 10.9.92.1 is meaningless out in the Qwest or AT&T's networks, so something needs to swap this for a "real" world-unique IP leased by Digi. Our corporate NAT interface creates a record (with a lifetime of 5 minutes) that allows any responses to be correctly restored to 10.9.92.1
The second NAT effect is when the Digi Connect WAN forwards to the ControlLogix with another private IP. So the 4-tuple now becomes DST=192.168.196.21 : 44818 and SRC=66.77.x.x : 22256. The ControlLogix thinks IP host 66.77.something is connected to it - not the real host IP of 10.9.92.1. Plus the ControlLogix has NOT CLUE that the RSLinx thinks the ControlLogix as IP of 166.something.
Now, to send a response the ControlLogix issues a TCP/IP packet with the flipped 4-tuple of DST=66.77.x.x : 22256 and SRC=192.168.196.21 : 44818. The Digi Connect WAN restores (undoes) the NAT and changes this to DST=66.77.x.x : 22256 and SRC=166.x.x.x : 44818. After passing back through AT&T and Qwest, Digi's corporate NAT interface restores its own NAT and changes it back to DST=10.9.92.1 : 22256 and SRC=166.x.x.x : 44818.
This understanding of NAT and IP is useful for understanding the capability and limitations of cellular access to certain devices with certain protocols. A future entry will cover setting up RSLinx Classic and using RSLogix 5000 to download over cellular to a L5555 processor.
Thursday, May 31, 2007
First, there are the people (usually who are new to cellular) who claim any day now the age of cheap, unlimited cellular data plans means all my hard work to understand or offer reduced traffic are wasted effort. I especially hear this coming from European partners.
Then there are the other people ... people I know work with very large, very powerful end users who fail to get the cellular plans they desire. Things I hear:
- I have heard of customers who pilot projects and hear promises of unlimited traffic, but when the time comes to sign the contract for 3000+ sites, the carrier decides that they cannot offer unlimited traffic ... period. Hmm, I guess this is the difference showing between the carrier's commissioned sales staff and the business managers who need to keep the cellular system profitable.
- I have worked with large customers who do manage to get "unlimited deals" for a modest sized system - say 50 or 100 sites, but the carrier insists on adding the 2 clauses 1) the carrier has the right to artificially slow down the data communications (without detailing what this means) and 2) the carrier has the right to just stop all the customer's data traffic temporarily if the cell system gets busy (again, not details of this defined). Hmm, I have to kind of wonder what kind of control engineer agrees to such "clauses"? You'd think trying to get one's data under control to avoid the need for unlimited data is a wiser design.
- I have heard that overall the cellular carriers are starting to rethink the value of machine-to-machine data plans. Unlike DSL or cable, this is NOT an issue of bandwidth; it is an issue of the % of the time the device "hogs" 1 of N slots or resources on the tower forever. Imagine having 10 or 20 such devices squatting there, locking up that tower resource. So it is not even so much an issue of talking once per few minutes verse flat out unlimited traffic. In either case 1 of N finite tower resources is used, so long-term the only "good" data plan may be for a data system using the cellular resource every few hours or a few times a day.
So overall it looks like my efforts to understand and reduce traffic is useful.
Tuesday, May 29, 2007
Title: Connect Local and Remote Devices in Hazardous Environment SCADA Applications
Register here: WebEx Link
Date: May 31, 2007
Time: 11:00am CDT (central zone)
Duration: 1 hour
- Lynn Linse - Engineer, Digi International
- Deb Smith - Business Development, Digi International
What you will learn
This live webinar will illustrate the value of using TCP/IP & UDP/IP based communication over wired and wireless networks to monitor and manage local and remote devices. Topics include:
- Overview of requirements for robust, remote, outdoor communication devices
- Discussion of extended temperature, conformal coating, hazardous certs
- Comparing features of IT-grade network devices to Digi's Haz product line
- Extending IP networks to remote serial and Ethernet devices
- Moving serial communications through TCP/IP and UDP/IP
- Migrating from analog Telco/POTS lines to IP-based broadband
- Real-world usage examples
- Implementation benefits
- Real-time access to remote sites and process data without site visits
- Reduce repair/replacement costs due to less-than ideal environments
- Options to reduce installation costs with wireless
Questions? Contact Deb Smith at email@example.com or 952-912-3283.
Friday, May 04, 2007
One of the fun things about being involved in "multi-vendor" solutions is when you recognize moments of amazing sanity as they occur. One such moment of amazing sanity is occurring next week when ODVA (aka Rockwell / Allen-Bradley) and Modbus supporters (aka Schneider-Electric / Modicon / SquareD / Telemecanique) sit down to discuss how to integrate Modbus devices into the ODVA Ethernet/IP and CIP network systems. Of course there must be some interesting hidden politics behind this move - and I somewhat light-heartedly believe that perhaps French Schneider-Electric sees joining with the Americans (Rockwell/ODVA) as the lesser of two evils when compared to joining with the Germans (Siemens/PNO).
Check out: ODVA Call For Members: Modbus Integration JSIG The kick-off meeting for the Modbus JSIG runs from Thursday, May 10, 11:00 AM to Friday, May 11, 04:00 PM
Side-stepping the marketing fluff and platitudes of a brighter future such meetings evoke, small third-party suppliers and the folks on the plant floor can expect the following benefits. Regardless of the directly stated goals of ODVA, Rockwell, or Schneider-Electric, small vendors will implement solutions that include these abilities:
- Vendors making Ethernet Modbus/TCP products will have a simpler "first step" to adding full ODVA/CIP support without the somewhat overwhelming task of 100% conversion of a word-array device model into hundreds (or thousands) of CIP objects.
- ControlLogix PLC will be able to connect through Ethernet-to-Serial devices to multi-drops of Modbus/RTU slaves. For example a user with a dozen small Modbus/RTU PID loop controllers will be able to add an Ethernet-to-Serial device to read via Modbus and cyclically produce a small block of word data from each loop controller over Ethernet.
- HART, Bluetooth, ZigBee and other new technologies which offer Modbus interfaces will find a instant place as sensors and I/O within CIP and Rockwell systems.
- Since Siemens, GE-Fanuc, Omron, Mitsubishi and most major PLC brands offer some method to act as Modbus slaves, users with any of these PLC will be able to integrate them within the CIP Producer/Consumer system.
- I started working with ODVA Ethernet/IP almost 8 years ago and still as-of today the legacy PLC5E, SLC5, and serial MicroLogix (the old PCCC-based PLC) don't have effective inclusion within CIP Producer/Consumer systems. Since the device model of PCCC PLC shares much in common with Modbus PLC, it is a very small enhancement to add a similar support for AB PLC - perhaps AB will actually extend this to future firmware updates to Ethernet-based PLC. In fact, since my Digi One IAP code already allows Modbus masters to query DF1 and CSPv4 slaves as-if Modbus slaves, as soon as Digi adds Ethernet-to-Modbus support per this JSIG's output users of older AB PLC will gain access to CIP Producer/Consumer systems indirectly as honorary Modbus slaves.
- Today legacy Modbus and Modbus/TCP systems lack any simple form of multicast producer/consumer exchange. While the IDA protocol offers this, IDA is so many orders of magnitude more complex (and resource hungry) than simple Modbus as to become really an "unrelated" protocol. Any specification that defines a "server interface" naturally implies a corresponding "client interface". So although this ODVA JSIG is not planning to define how Modbus "peers" could use multicast to exchange cyclic data, the end result will be a fairly natural and multi-vendor method to do this. So while I doubt many pure Modbus/TCP products would implement ODVA protocols just to gain this multicast exchange, any products which add the CIP support anyway will naturally add the last few bits of code required to enable pure Modbus-to-Modbus multicast via the ODVA mechanism.
- Taking the above point to its natural conclusion means Modbus/TCP masters which implement the Modbus JSIG's "server" function will also gain a mechanism to access CIP Producer/Consumer systems. Even if the ODVA JSIG doesn't cover how to do this, natural methods will be inferred, produced, and copied by vendors to make this a fairly common new product feature.
Wednesday, April 25, 2007
PLC Protocol Example:
A simple but realistic SCADA scenario is to poll every 15 minutes and read 10 words of data and write 2 words of data. This commonly requires 1 Read command and 1 Write command (I'll ignore the rarely supported Modbus command that reads & writes within a single command.)
While there exists special SCADA protocols and special products that optimize remote traffic, I am not looking at those protocols at the moment. Instead, since cellular and satellite access to remote IP and Ethernet products has enabled people to use off-the-shelf PLC technology, I am looking at the more traditional PLC protocols. These are things which affect users when they apply an Ethernet design to an IP-based wide-area-network system.
I compare these 4 PLC protocols:
- AB/DF1 Radio Modem (RM) encapsulated in UDP/IP. DF1 RM is basically DF1 Full-Duplex with no ACK/NAK and is supported by the SLC5 and MicroLogix line.
- Modbus/RTU encapsulated in UDP/IP. Modbus/TCP within UDP/IP is roughly the same size.
- AB/CSPv4 in TCP/IP as supported by SLC5/05 and PLC5E MSG blocks.
- AB/Ethernet/IP as moved by ControlLogix Explicit MSG blocks to PCCC-based remote PLC. Note that Ethernet/IP "I/O Messaging" does NOT work through NAT'd wide-area-networks since the protocol embeds IP information within the data packets and is thus is "broken" by NAT.
|Protocol||Transport||Per 15 Min||MB per month||Relative Cost|
|Ethernet/IP||TCP/IP||1202 bytes||3.46 MB||100%|
|AB/CSPv4||TCP/IP||960 bytes||2.76 MB||80%|
|Modbus/RTU||UDP/IP||166 bytes||0.48 MB||14%|
|DF1 Radio Modem||UDP/IP||194 bytes||0.56 MB||16%|
The two Rockwell "Ethernet" protocols cost a lot more to use in part because they force use of TCP/IP, and therefore suffer the repeated cost of TCP socket opening and closing, plus extra TCP acknowledgment overhead. They also suffer because they both involve connection registration and service functions that needlessly repeat every time the connection is reestablished. While the actual data packets of these protocols are roughly twice the size of the serial encapsulated protocols, the real burden they suffer is all the extra TCP/IP packets exchanged that do NOT directly involve field data update.
Both the serial Modbus/RTU and DF1 Radio Modem benefit that they move no IP packets that don't relate to the field data update - no TCP/IP open or close or acknowledgement; no protocol "service function" overhead. Each moves just 1 read request and 1 read response, plus 1 write request and 1 write response.
Discussion of Other PLC Protocols:
Most other PLC Ethernet protocols will either approach the costs of the AB/CSPv4 - or they won't work at all due to use of direct "Ether-Types" and lack of IP compatibility. Most serial protocols with roughly either match the 2 show here or be twice the cost if protocol ACKs are used by the protocol.
Modbus/ASCII will almost double the cost of Modbus/RTU since each data packet is roughly twice the size. But this wouldn't increase the IP overhead any.
Using DF1 Full-Duplex instead of Radio Modem would effectively double the cost over DF1 RM since DF1 Full-Duplex moves the protocol ACK/NAK, which doubles the IP header overhead also. Using DF1 Half-Duplex would triple or even quadruple the costs since HD not only moves protocol DF1 ACK/NAKs, but the ENQ/EOT polling overhead.
Most other protocols I am aware of - such as Omron Hostlink, GE-Fanuc SNPX, and Siemens PPI - would cost roughly 2 to 3 times more than Modbus/RTU or DF1 RM since they include protocol ACK, while a few even encode many parts of the message as ASCII or BCD form instead of as binary.
Thursday, April 19, 2007
For a simple SCADA-style example assume we need to read 10 words of data (20 bytes) and write 2 words (4 bytes) every time period. Obviously there would be simple optimizations to this, such as only writing data which changes or using PLC MSG blocks to push data from PLC to SCADA only when something changes. However my goal in this blog post isn't to "tweak" a solution to minimize cost, but to examine the protocol impact of using Rockwell CSPv4 over IP.
The table below shows the megabyte per month when polling once per second, per 5 seconds, per 1 minute, per 5 minutes, per 15 minutes, and per 1 hour. There are lots of variables considered ... and many more ignored. The traffic ranges from worst-case of 1005.0 MB for TCP/IP with larger header options polled once per second to best case of 0.2 MB for UDP/IP polled once per hour. This assumes the use of the CSPv4 submode 7, with local LSAP addressing and ignores that Rockwell PLC5E and SLC5/05 don't support CSPv4 within UDP/IP. Raw efficiency at moving the data bytes ranges for about 10% for UDP/IP to barely 1% for TCP/IP; which means most of what you are paying for is not related to actual, meaningful field data.
( Click this image to see a larger version )
(is at http://iatips.com/blogimage/rockwell_cspv4_traffic.png)
Notes on the Table
Since this example reads and writes small amounts of data, it assumes a SLC5-style Protected Typed Read with 3-Address Fields and the corresponding SLC5 write.
The smaller 40 byte TCP/IP header has no options attached; the larger 52-byte TCP/IP header includes the RFC 1323 Timestamp and Window Scale TCP options. These appear to be the normal default for Linux and easily becomes enabled under Windows since all applications share a single setting in the Registry.
The two time columns "15 min (Alive)" and "1 hr (Alive) assume a roughly 4 min 45 sec TCP keepalive to prevent the socket from closing. This reduces the traffic by the extra open/close overhead in exchange for billable TCP Keepalive packets. Keep in mind this ALSO requires the PLC to be properly configured to NOT close the idle sockets. By default, my SLC5/05 seems to close the idle connections in a few minutes.
The two time columns "15 min (Cls)" and "1 hr (Cls) assume the socket is closed after the a data polls, and the TCP socket and CSPv4 session must be reopened for teh next poll.
Of course the standard costs of using TCP/IP verse UDP/IP apply:
- TCP/IP uses larger headers, ranging from 40 to 52 bytes per packet as compared to UDP/IP's smaller 28 byte of header.
- TCP/IP involves the TCP Acknowledgments, which may result in separate, billable 40 to 52 byte packets moving frequently without any meaningful field data.
- TCP/IP may require reopening a socket, costing 120 to 250 bytes per open, plus closing costing from 160 to 400 bytes. Exact sizes are hard to predict since both opening and closing of sockets tend to be "pushed" and result in excess retransmissions and retries when high network latency is true.
- TCP/IP over unknown 3rd party wide-area-network infrastructure requires at least 1 TCP packet to move every 4 minutes 45 seconds to maintain health. This means either a data packet or a TCP Keepalive with data.
CSPv4 issues include:
- Rockwell PLC and software tools do NOT support use of UDP/IP - my tests with UDP/IP have to be conducted with the Digi One IAP which happily bridges CSPv4 between TCP and UDP (as well as to or from Ethernet/IP and DF1).
- CSPv4 requires the exchange of a pair of 28-byte negotiation TCP packets when a new TCP socket is opened to inform the client (master) of a server (slave) assigned session handle. This nearly doubles the overhead of an open-poll-close socket paradigm.
- The 28 byte CSPv4 header really contains little useful information; such excess bytes cost nothing tangible under Ethernet but cost cash in the form of requiring larger cell plans over cellar.
Your only effective solution at present is to carefully craft a set of MSG blocks to push data from the field in a report-by-exception paradigm. Of course you also must include safe guards within your PLC to prevent rapid, repeated MSG block triggers during system failure that could cost you thousands of dollar ($$$) in a few days.
Monday, April 16, 2007
I haven't looked over his code yet, plus all I have is VB 2003 .NET.
Hopefully Microsoft has STOPPED the old VB issue that each new rev of VB is neither 100% forward nor backward compatible ... one always need to "tweak" a few lines to make the port work. I've used VB 1.0, 3.0, 4.0, 5.0, 6.0 and now VB 2003 and none of these have liked old code being pulled forward.
Monday, April 09, 2007
The Rockwell/AB SLC5/05 and PLC5E natively speak an older "unpublished" protocol named CSPv4, although most third party vendors call it either AB/Ethernet or AB/TCP. It moves only on TCP port 2222 - ODVA Ethernet/IP I/O Messaging is only on UDP port 2222, so they don't conflict. The protocol consists (normally) of a 3-part packet:
- 28-byte header
- 4 or 15-byte LSAP or end-point addressing packet
- PCCC message which is basically what DF1 documents as an Application Packet
- Rockwell tools and PLC only support use of TCP/IP and port 2222; this greatly limits use of CSPv4 in NAT'd networks since the remote NAT router can only forward TCP port 2222 to a single remote PLC.
- CSPv4 includes a single TCP packet exchange to "register a session" or connect. If you are polling faster than the PLC will hang-up on you, then this is not important. However, if you poll slow enough that a new TCP/IP socket must be opened for each poll, then even ignoring the TCP socket open/close overhead this nearly doubles your traffic costs.
- In tests, a SLC5/05 seems effective at including the TCP ACK response to the host within the CSPv4 data response packet, so you only have to pay for one empty TCP ACK, which is the host's acknowledgment to the PLC for the response.
- TCP Keepalive could be an issue, since most hosts fail to issue it and the SLC5/05 I've tested against either doesn't issue TCP keepalives
or does it very frequently.
Thursday, April 05, 2007
Your Second Step should be to set up a simple, isolated low-speed broadband link at work ... create your own DMZ lab.
Sigh - I waste so much time listening to customers complain about how difficult it is to get the IT department to give them custom firewall permissions. Since modern "Security" wisdom is to block everything until proven safe, I waste more time asking customers complaining that their Modbus/TCP or Rockwell access not working to first talk to their IT group to make sure they aren't blocking unknown binary traffic by default. I waste yet more time when customers struggle for days and finally have to formally get someone in IT to help study the corporate firewall logs to see if any traffic is getting through or not. An interesting epiphany occurs when I suggest they just look into paying roughly $50 per month for a private connection for this. It is surprisingly cheap to do this and makes a lot of people's jobs 200% easier.
So far the feedback from customers has been quite positive, with IT departments over-joyed at the idea (slight exaggeration :-] lessor-of-two-evils may be a better term). This really makes sense; IT is charged with keeping the corporate system working and secure, so when you ask for yet another odd, unknown firewall hole to be opened, you ask them to risk their jobs. Plus trying to keep custom firewall settings updated for 50 different projects is an ongoing headache and ongoing risk for mistakes. I know that Digi's IT group is very satisfied with their policy of not offering custom firewall rules on the corporate LAN but instead helping teams set up such private connections in a safe, isolated manner.
Simple DMZ Lab Design
The simplest lab design is little more than a copy of what you have at home: a computer or two, an 8 or 24-port Ethernet switch, and a simple NAT router to "share" the internet connection with a dozen devices. This allows you to freely set up a few OPC servers and Master PLC to test access to remote cellular and other wide-area-network based systems.
- Locate an empty office or lab room for your new network. Perhaps your IT people should pull out or disable the corporate Ethernet in this room. You are going to create a small "DMZ"; a small isolated network that has limited security consequences if you goof up and let a hacker inside. You'll want good security tool installed on your Windows and Linux computer used in here.
- Arrange for a low-speed business broadband link with one fixed IP address. 256Kbps is more than enough for general PLC/SCADA testing and should cost in the range of $35 to $50 per month. Yes, just $35-50 per month! I had one customer forced to pay his IT department $100 per month to open one TCP/IP hole in the corporate firewall!! Gee, he could install 2 DSL links for that. Now, be patent when you talk to your carrier, as they are geared to sell the expensive primary access lines used for all corporate traffic including servers. Keep stressing that you want a low-speed secondary line for use with some network testing and eventually you'll locate the low-cost plans you want.
- Set up a DNS name for your DMZ lab. Online dynamic DNS providers support user-selected DNS names for static IP addresses. I use dyndns.org for both my dynamic and static IP but there are many out there. You won't need any form of DDNS update client since your IP never changes and you must enter the name manually anyway.
- Unless you plan to implement large VPN systems, just buy a nice consumer-grade DSL/Cable Router ... the same kind you use at home is fine. If you plan to set up a serious VPN infra-structure, then you'll just need to bite-the-bullet (& suffer the learning curve) of buying a commercial IT-grade router with VPN server capability built in. Be warned that while many consumer-grade routers mention "VPN Support", they are in fact sub-optimized and documented only for home-office users who connect into a corporate Windows or Cisco VPN server. Normal human beings will find them nearly impossible to set up for anything else!
- Do you want more than one public IP address? You need to pay a monthly surcharge which varies greatly per carrier, but could be in the range of $25 per month for 8 IP addresses instead of just 1. Plus you will need a larger IT-grade router since the consumer-grade routers won't support more than 1 public IP address. Most users won't need more than 1 IP address. However, having more than one IP address is helpful if more than 1 team shares the lab; this prevents them from trying to setup conflicting router configurations. Also, a few extra IP are helpful if you want to place a PLC "online" for your customers or sales force to access during customer-site demonstrations in the field.
- If you need to access your corporate network from your DMZ lab, then you need to arrange some rules with your IT people. Perhaps the rule is using a notebook computer with 802.11 wireless to the corporate network is Ok as long as the notebook is NEVER connected to the lab's Ethernet. Remember, since your lab has it's own public IP address you can even use FTP or a VPN client to connect "legally" out your corporate network and back into your lab via the Internet.
Since Digi is basically a "communication company", the lab I get to use is much fancier. It has 32 public IP addresses and even limited secure access from the corporate LAN. Of course I have to share this lab with other teams, so I'm not owner of 32 IP. As an example, here is how my lab is setup:
- Digi's IT group maintains a Cisco PIX router (a mid-range $800 model) that manages the 32 public IP addresses. Actually, this is NOT a complication since this router does not by default firewall any traffic; it merely distributes raw traffic based on public IP to one of many to internal IP addresses in an organized manner. I was lucky enough to get in the lab early and to be assigned 2 of the 32 public IP addresses.
- My first IP address receives 100% raw internet traffic at an internal static IP I selected; so an external IP such as 70.x.x.140 forwards to my internal IP of 192.168.20.159. Since the goal of the lab is to avoid burdening IT with TCP/UDP port forwarding chores, one could place a consumer-grade DSL/Cable router at this IP. Placing 2 routes in series is NOT a problem - do a net-trace of how you access www.google.com and you'll see a dozen or more routers in series. Instead of a pure hardware box I have a Ubuntu Linux machine running firewall and router tools at this IP. This is where I forward Modbus/TCP to one PLC and Rockwell Ethernet/IP to another PLC. I prefer the Linux box to the $39 hardware box because it gives me a richer view of traffic in and out, plus I can run an Ethernet sniffer such as WireShark to see a complete trace of the 2-way conversation taking place.
- For my second IP address I had Digi IT setup the PIX router to just forward a simple, safe list of Modbus, Rockwell, Digi, and other industrial protocol TCP and UDP ports. I normally have a Digi One IAP (an industrial-protocol aware Ethernet-to-serial device) at this IP address, but I can safely swap in a Windows machine when a test requires use of Windows tools.
Since i study cellular usage, I have have a Digi Connect WAN providing an Ethernet-based cellular router in the DMZ lab. So I really have 3 potential routers to use - while it takes a bit of IP experience to not get confused, this allows me to have devices configured to selectively treat any 1 of the 3 routers as "the default gateway/router".
- For example, I can have a Master/Client device connect out to a cellular-based Slave/Server using a route such as Master => out PIX+DSL => in Cellular => Slave.
- For example, I can have a Master/Client device connect via cellular to a DSL-based Slave/Server using a route such as Master => out Cellular => in DSL+PIX => in Linux-Box => Slave.
- In both of this situations I can see BOTH ends of the conversation, which is a huge help in testing, timing, or troubleshooting new applications.
Tuesday, March 27, 2007
Homework - Work at Home
When engineers first launch into a cellular data pilot it can be a bit like Christmas with the excitement of new toys, future trends and being "on top of it". However, I encourage anyone interested in using cellular or satellite-based IP systems to do some home work first ... literally "work at home". You'll save lots of cash and avoid many headaches by learning the basics at home first.
Most of you have cable or DSL router/modem at home, so start there. Take a PLC or controller home. If it has an Ethernet port you are all set; however if your device is RS-232 based, then beg, steal, borrow, or purchase a simple Device Server such as the Digi One IAP (fancier, Modbus and Rockwell protocol aware) or the Digi One SP (much cheaper but just a raw Ethernet-to-serial converter). Your goal is to connect from your office computer over the Internet to this device at home ... if you cannot succeed at this, then you won't succeed at cellular access either! But unlike with cellular, all of your trial-and-error over your Cable/DSL Route won't be costing you by the byte.
Just remember that your "Home Cable/DSL Terms and Service Agreement" likely forbids running "servers" so don't go and try to setup an e-commerce shop once you see how easy it is to access your home from the Internet.
Get to Know Your Cable/DSL Router Box
Hopefully you all have an external commercial router box that you either got from your ISP or bought at any big-box store for $39 to $59. If your computer connects directly into your modem or you were fooled into using Microsoft's "Internet Sharing" tool on one computer, save your sanity and go buy a cheap router box! For your $39-59 you get a 4-port switch, a professional stateful-firewall and NAT (more about that later), a wireless access point, and it all consumes maybe 8-10 watts of power so costs you a few $ a year to run. If for no other reason, you just don't want the mindless broadcasts and hacker probes taking a percentage of your home computer's bandwidth. For my VPN testing I have some Linux boxes up exposed like this and they see up to 50 broadcasts per second and a few dozen probes for open Windows and Unix services per hour. There is NO REASON to expose your home PC to this rubbish - use an external router box ... period.
Step 1: Learn how to log onto your Cable/DSL Router.
- Under Windows 2000 or newer, open a command window and type the command "ipconfig". You should be shown your computer's current IP Address and the Default Gateway, which is another name for your Cable/DSL router. Most likely the router has an IP such as 192.168.0.1 or 192.168.1.1.
- Confirm you can ping your Cable/DSL router with this IP
- Open your web browser and browse to the address - as example type the URL "http://192.168.1.1". You should be asked for a user name and password.
- Check with your router documentation or go on line to the vendor and read the user guide. For example, at home I have an ActionTec router/wireless access point supplied by Qwest, and when it first came it has no user name and a password of "admin". This is actually not so insecure since by default you can ONLY access this web page from inside your firewall/router. But common sense says changing this name/password is wise.
- There is no way I can explain how all Cable/DSL routers work, but once you can log in you should be able to find a status web page which gives your currently assigned external IP address and 2 DNS addresses. This is how the world sees your home system - write this info down. For example, my home Cable/DSL router (as of today) has the temporary (dynamic) IP of 63.228.51.x.
So at this point, you know how to access your raw "face" exposed on the Internet. Now we want to give ourself a nice, memory-friendly DNS name to represent that face.
- As mentioned above, my Qwest IP is dynamic and liable to change at any time. So while I could go to the office and try to point my OPC server or PLC software at 63.228.51.x, I can never be sure how long this will work. In reality it only changes every few months or if I power-cycle my router, but the solution to this problem is very easy so we should solve instead of work-around it.
- Sign up with one of the many free online Dynamic DNS providers - I use dyndns.org. The Digi Connect WAN (cellular router) family directly supports this, as do many LinkSys and DLink-class home products. In a nutshell, they allow you to create a domain name such as sillyjoe.gotdns.org or sammy345.dyndns.org and then a client tool on your home system automatically updates this DNS name every time your ISP changes your dynamic IP address.
- While the above service is free, you may want to pay the $10 or so per year for a minimum account. This makes the service more tolerant of errors on your part - for example many services automatically delete your free account if it is untouched for 45 days and so on.
- If you have a Windows computer, the easiest DDNS update client is just to download the Windows tool recommended by your Dynamic DNS provider. This client automatically monitors the Cable/DSL router's IP as it accesses the internet. If your IP has changed, it correctly updates the DDNS (dynamic DNS servers). I stress the word "correctly" since many external Cable/DSL Router boxes which support DynDns and such services come with bugs which cause your free service to be deleted within hours of setup. So if you chose to use your Cable/DSL Router to maintain your DDNS name, make sure you have the latest firmware upgrade on it!
- Within an hour of setup, anyone in the world should be able to ping your new DDNS name and get a response.
We are almost ready to try access - but if you point your OPC server at your DDNS name ... nothing will happen since your Cable/DSL Router does NOT understand Modbus or other industrial protocols. Remember, the IP your DDNS name represents is the IP address of your Cable/DSL Router and NOT the IP of your home computer nor is it the IP address of your PLC/controller device.
- Log back into your Cable/DSL Router and locate the setup for port forwarding. Some routers call it setup for applications and games. We need to configure the router to FORWARD specific TCP and UDP ports to Ethernet-based devices you have at home. If you don't know what that means, you are in for a tough time using wide-area-network technologies - I suggest you go to any bookstore and buy a book on basic networking that covers what TCP, UDP, IP and NAT are. This is really key to success in this area. You don't need to be an expert, but you do need to understand the basics!
- In summary, if you think of the IP address as being synonymous with the main phone number in a building (aka - how to telephone the building), then the TCP and UDP port numbers are synonymous with phone extensions within that building (aka - how to reach a certain department or service). So for example, a Modbus/TCP OPC server will connect to your router using the IP (main phone number) attached to your DDNS name, and then request a connect to TCP port 502 (the service). We need to configure the router to accept and forward the Modbus/TCP traffic to your Modbus PLC. So take the example of a Modbus/TCP device on my local network with the IP 192.168.1.105. We need the router (at say IP 63.228.51.x) to accept any incoming connection on TCP port 502 and forward the packets to the local Ethernet device at IP 192.168.1.105 TCP port 502. Since the PLC has a web server and most ISP block access to home web servers, we'll tell the router to forward TCP port 8080 to local port 80. Depending on the brand of router you have the configuration can get fancier than that - but basically we'll end up with a line in the table looking something like:
|Incoming port||Service||Local IP||Local Port|
Step 4: Get to Work Learning
That's it - at this point you should be able to use a Modbus/TCP OPC server to poll your PLC indirectly by polling your DDNS name on the standard TCP port 502. Pointing your browser to http://your DDNS name:8080 will pull up your PLC's web pages (the ":8080" tells the web server to use TCP port 8080 instead of default 80).
Of course there is no security offered here - anyone in the world can access your PLC so this is just for educational purposes. Here are some common port numbers to use:
- Modbus/TCP uses TCP port 502
- Digi Ethernet-to-Serial products use ports like 2101, 2102, etc to access a serial device by raw TCP or UDP sockets.
- Digi RealPort uses TCP port 771 (TCP 1027 for SSL/TLS secure connection).
- Rockwell AB PLC5E and SLC5/05 use TCP 2222 for the older legacy CSPv4. This is often called AB/Ethernet or AB/TCP by 3rd party vendors
- Rockwell ControlLogix and ODVA Ethernet/IP uses TCP port 44818, UDP port 44818, and UDP port 2222. But be warned Rockwell tools are very poorly designed for wide-area network use.
- Siemens S7 protocol uses TCP port 102
- GE SRTP uses TCP ports 18245 and 18246
- GE QuickPanels use TCP port 57176 for configuration
Wednesday, March 14, 2007
The Mystery 17% Cost Increase:
Last night I ran a test polling ten words once a minute from an Allen-Bradley SLC5/05C's N7 file over GSM. This is nothing exotic - I ran similar tests a few months ago and had preconceived ideas of what to expect ... beep ... wrong! In between Then and Now, some unknown application changed my Windows XP system registry, enabling the "RFC 1323 Timestamp and Window Scale TCP options". The end result was an unexpected 16.51% increase in data byte traffic with no perceived value.
I have no clue which tool did this; and unfortunately Windows (at least 2K and XP) use a single global setting for the entire TCP stack. I could change it back ... but would that break this other mystery application? Will this other mystery application just change it back? Will I launch a mini cold-war race as this mystery application tries to keep RFC 1323 enabled and my test tools try to keep it disabled?
The Byte Counts with and without RFC1323:
Here is an exact accounting of the change in byte counts - remember, cellular is basically a mobile-IP tunnel which moves TCP/IP or UDP/IP as pure data payload. So you pay for both the IP and TCP headers, plus any data-less TCP Acknowledge or Keepalive packets.
I'll ignore the opening and closing of the socket, plus TCP Keepalive since I'm polling fairly steady-state once per minute. The PLC includes the TCP ACK in the response, so at least we avoid 1-of-2 data-less TCP Acknowledgments.
|no RFC1323||with RFC1323|
|Request: IP header||20||20|
|Request: TCP header||20||32|
|Request: CSPv4 Packet||42||42|
|Response: IP header||20||20|
|Response: TCP header||20||32|
|Response: CSPv4 Packet||56||56|
|Client ACK: IP header||20||20|
|Client ACK: TCP header||20||32|
|Client ACK: (no data)||0||0|
|no RFC1323||with RFC1323|
|Total Bytes per Poll||218||254|
|Total Bytes per Hour||13,080||15,240|
|Total Bytes per Day||313,920||365,760|
|Total Bytes per Month||9,417,600||10,972,800|
So this means a user doing 1 read of 10 words per minute would magically see a 16.51 % increase in data traffic ... just because they (or the IT department or even Microsoft Windows Update) changes a hidden registry setting. This is yet another example of both how hard it is to keep tight control on your cellular data costs; plus adds to my belief that using off-the-shelf host applications over cost sensitive IP networks is a losing battle. At some point you'll need a tool or device which is 100% "under-control" when it come to packet creation.
Windows Registry Details:
Value Type: REG_DWORD—number (flags)
Valid Range: 0, 1, 2, 3
- 0 (disable RFC 1323 options)
- 1 (window scaling enabled only)
- 2 (timestamps enabled only)
- 3 (both options enabled)
Description: This parameter controls the use of RFC 1323 Timestamp and Window Scale TCP options. Explicit settings for timestamps and window scaling are manipulated with flag bits. Bit 0 controls window scaling, and bit 1 controls timestamps.
Friday, February 23, 2007
Modbus/TCP is inherently peer-to-peer
People using Modbus/TCP over Ethernet or IP are familiar with its ability to function as peer-to-peer. Most PLC with Ethernet ports can function concurrently as a Modbus/TCP slave and Modbus/TCP master. So 2 PLC can very easily connect - with 2 separate Master-Slave TCP connections - and share information. One TCP connection is a Master/Slave connection with the first PLC as Master and second PLC as Slave. The other TCP connection is a Master/Slave connection with the first PLC as Slave and second PLC as Master.
Technically, this is not Report-By-Exception in the true sense of a protocol. However, since the PLC-as-Master communication events can be triggered by field inputs, it has the same effect as writing information only upon exception or when change is relevant.
Modbus/RTU as peer-to-peer
Serial Modbus/RTU is a bit harder to use this peer-to-peer trick with. A device with 2 serial ports can of course have 1 port configured as Master to issue remote reads and writes, while the 2nd port is configured as Slave to answer requests. When connected to a 2 serial port Modbus IP to Serial Bridge (such as the Digi One IAP), the 2-port serial RTU becomes a full Modbus/TCP peer, capable of operating fully peer-to-peer with other PLC and SCADA/OPC applications.
However, vendor's aren't blind to the marketing aspect of "more hardware". While adding a 2nd serial port to a one-port RTU may only cost a few dollars, most likely the 2-port RTU is a much more powerful unit, so the actual end-user cost may go up hundreds of dollars. The same is true of Ethernet; while adding an Ethernet port may only cost a few dollars, user's expectations of Web Pages and fancy functions means the Ethernet-enhanced device price may be $500 or more above that of a simple 1 serial port RTU
Fortunately, the Digi One IAP (as well as PortServer family) allow Modbus/RTU slaves to use Report-By-Exception on the serial port. As long as only one serial slave is on each port, the Digi uses configured knowledge to "split" the single serial port conversation into two traditional Modbus/TCP connections. So traditional remote Modbus/TCP masters can query the serial slave RTU, completely unaware that on occasion the serial slave RTU wakes up a acts as a Master to write data during Exceptions. Somewhere, a traditional remote Modbus/TCP slave will receive Modbus/TCP messages from the serial RTU slave, completely unaware that when not busy reporting exceptions, the remote "Master" is really a passive Modbus/RTU slave.
This feature is ideal in wide-area-network situations were bandwidth is limited or data traffic is billed on volume of bytes moved. For example, many SCADA systems only need to check on remote status every few hours ... for example lift pumps in a storm sewer system do absolutely nothing interesting for weeks or even months in the absence of rain. Even during a normal rain, checking on them every few hours is likely enough ... that is *IF* the remote life pumps can send Report-By-Exception messages during system problems.
For example, we have one customer piloting use of Modbus Report-By-Exception over cellular data network. Their eventual target is to poll the remote sites once per day. They use a simple, single-port Modbus/RTU slave which combines I/O with an LCD and push buttons to make a simple, self contained "RTU" or Remote-Terminal-Unit in the truest sense of the word. Use of Modbus in UDP/IP and Report-By-Exception allows this customer to plan for $12 per month per site bills. If forced to poll continuously with Modbus over TCP/IP, they would need to pay $50 or more per site per month. With hundreds or thousands of sites, that is a huge cost savings and opportunity for better ROI (return on investment).
Here is a general discussion of how to design Modbus/RTU serial slaves and masters to gracefully handle Report-By-Exception:
Friday, February 09, 2007
Many newly written Ethernet-enabled applications incorrectly equate "Ethernet = Fast". They overlook that Ethernet is often just a path into other slower IP-based networks. Worse, some well meaning programmers set the response default to 250 milliseconds and limit the user configuration to a maximum of 5 seconds - I'd say so far about 20% of the applications I've had to help customers will limit Ethernet timeouts to 5 seconds or less.
But cellular networks have a high end-to-end latency - especially if the line has been idle for a few minutes. Normal slave response times will be near 2 seconds with round trip delays up between 10 to 12 seconds common each day (see my entry on real world Modbus numbers). Interestingly enough, every cellular "expert" I talk to keeps correcting me that cellular latencies are in the 50 to 100msec range and getting better every new "gen". Well, I guess my Saturn ION can do 400 miles-per-hour also ... if you drop it out of an airplane! Well, regardless of what these "experts" are smoking, my simple tests show otherwise where it really counts ... in actual real world tests run over the Internet to cellular-based IP devices.
Recommendation: IP applications should default to a 3 second response timeout. Applications must allow users to configure this timeout to be lower (perhaps to 250msec) and also higher to at least 60 seconds.
Impact: On Ethernet this should have no direct consequences since the timeout only has affect if the remote is no longer available - in which case the remote is going 'offline' anyway. The minority of users who really want a 250 millisecond timeout can set it manually, while cellular users who want a more reasonable timeout of 15 seconds can also set it also.
For cellular networks, the real problem with premature timeout is the customer has already paid for the request and very likely will also pay for the response - even if the response comes after the application gave up on the response and did a request retry. Assuming the user is polling the remote at a moderate pace to control costs, there is no harm is waiting longer for the response to maximize the value of the traffic paid for.
Another simple example is an application that sends a request, then timeouts twice and retries twice. How will the application react when it receives three responses at the same time? Remember, the first two requests probably were not lost; they still likely reached the remote device and created responses. Their responses may have been just delayed longer than expected. Since serial Modbus doesn't include enough information in a response to match it up to a request, this can cause serious misoperation of the system. Protocols including a sequence number should handle this more gracefully, but it will still be a waste of money.
We have also seen protocols which treat unexpected responses as a reason to abort and reset the communication channel, which further adds to cost. For example, we had one super headache with a big-name seller of "energy curtailment" systems. The end user insisted a 5 second timeout was the maximum they could tolerate (ie: wishful thinking - set a 5 second timeout regardless of reality). So lets just see what happens when we hit one of the rare but expected latencies over 10 seconds.
- SCADA software sends out request sequence 74
- 5 seconds later, SCADA times out 74 and sends out 75
- 5 seconds later, SCADA times out 75 and sends out 76
- 1 second later - since TCP/IP is reliable - all three responses return
- SCADA is expecting response 76, but sees 74 ... Oh, big problem ... need to reset comm subsystem
- SCADA sends reset to remote RTU, expects response 1 but ... da da ... sees response 75 since they never flushed the old info and TCP/IP is reliable.
- SCADA sends a 2nd reset to remote RTU, expects response 1 but sees response 76 since they never flushed the old info and TCP/IP is reliable
- At this point, I hope you see that there are still 2 responses to the comms reset in the receive queue!
Anyway, whenever this reset "temper-tantrum" occurred it would take 10 to 15 minutes to get the connection back up. Of course one problem was the stupid customer unwilling to set the correct timeout, but the SCADA software was defective since it wasn't smart enough to just discard old responses with timed out sequence numbers. In the above example, life would have been fine and dandy had the SCADA system just discarded responses 74 and 75 since it expected 76.
Wednesday, February 07, 2007
I have a DNP3 RTU up on my public cellular device, but need to confirm details of how the public can access it. I also had a discussion with the primary provider of DNP3 source code in the world and we will be looking at putting a DNP3 slave simulator up via cellular. It would be really userful if this could expose some of the statistical & diagnostic info managed by the simulator. This would help software vendors fine tune their software to handle the variable latency of cellular.
Thursday, February 01, 2007
My last post created some interesting feedback. But I want to emphasize a topic from that post more fully. For the last 15 years I've been involved in the "multi-vendor interface" business - linking multiple vendors' equipment by data comms. First I worked in RS-232 and 485, then fiber optics, then Ethernet, and now by virtually every technology that moves TCP/IP.
From time to time I am contacted by some pretty desperate customers - for example I had one customer who had piloted some Ethernet-based temperature sensors. Things worked fine in the lab with their lab computer, so they bought 50 ... only to find out they couldn't use them. It seems these sensors really were "just Ethernet" - they talked by Ethernet broadcast and direct MAC-layer packets. They didn't support TCP/IP and therefore could NOT be routed by any standard network infrastructure. The user could not talk to any of the sensors they had intended to install in panels around the plant because the "Computer Room" wasn't on the same physical Ethernet segment as the "floor". There was no way to broadcast or unicast MAC-level between the systems. This customer hoped I knew of some magic box to act as gateway between TCP/IP nodes and pure Ethernet nodes; I didn't.
So this brings me back to the concept of the true cost to implement "Ethernet". Customers who ask for Ethernet are not really asking for Ethernet hardware or an Ethernet media bus. They have the expectation that they can interface your "Ethernet Devices" with the wide variety of other equipment they have - including WiFi, routed Ethernet, fiber optics, wide-area networks, and so on. They also expect (at least in a future firmware rev) web pages for configuration, SNMP for remote management, strong encryption, and so on.
So the term "Ethernet" has taken on a life of its own - remember when 802.11 was called "Wireless Ethernet". Well, there is absolutely NOTHING Ethernet about 802.11, yet it was a useful PR move to link the two. No doubt it helped spread the acceptance of WiFi as we now call it. Interestingly enough, the current PCI verse PCI-Express adapters you buy for a PC are using the same PR trick - linking a new, unknown technology to an old established technology that merely accomplish the same function by very different means. Maybe Sony should have called Beta-Max VHS-Max instead ... but then I'm showing my age by even knowing that a consumer-oriented video standard other than VHS even existed.
But back to Ethernet. If you are a small device maker and have yet to start making Ethernet-based products, just be aware that customers who ask for "Ethernet products" aren't really asking for ... err, products with Ethernet. They are asking for products which integrate into (at a minimum) the wide family of TCP/IP based technologies out there. I am not even talking about should you support Modbus/TCP or ODVA Ethernet/IP or ProfiNet yet. I am just saying customers will expect your "Ethernet products" to be able to hold a raw TCP/IP or UDP/IP conversation with all of the other equipment they are investing in daily.
So the cost to add an Ethernet port is just a small part of your cost to "add Ethernet". That is why companies like Digi can sell Ethernet-to-Serial converters or sell "async Ethernet driver chips" like the Digi Connect ME which links to your CPU's serial UART. These devices of course cost more than $9.95 or even the cost of a few new hardware chips, but that higher cost is paying for TCP/IP, web servers, SNMP servers, strong AES encryption and all of the other things your customers expect when the buy "Ethernet products".
So to digress a bit, I suffer this "Oh, don't worry ... it's Ethernet" on a daily basis. So far I have to say at least 95% of the off-the-shelf software applications I test supporting TCP/IP don't work well with technologies other than direct Ethernet. This includes problems not only when extremely different media like satellite or cellular, but even when WiFi is used. So that is part of my mission in this blog - what you want is NOT to Ethernet-enable your products. Instead you need to "IP-enable" your products by way of an Ethernet interface.
Monday, January 29, 2007
The $9.95 Mindset:
Any detailed discussion about “special Ethernet for Industry” first starts with the fact that customers can buy 10/100 computer Ethernet adapters from any big-box store for $9.95. So users have this perception that the increment for Ethernet is small & cheap. While they may not expect you to sell Ethernet products for $10 more than serial products, they won’t be happy to hear that your Ethernet product is $300 more. I will tie this together below, but the bottom line is the closer you can match your Ethernet hardware to the market norm, the lower your over all costs will be.
Who Pays for Extra Software Work?
Of course, that $9.95 computer Ethernet card doesn’t include:
- Microsoft’s ROI on TCP/IP and network stack work
- the OPC server cost to add Ethernet drivers in place of serial drivers
- tool vendors need (ie: your need) to rewrite serial-based tools to become network-based tools.
What is the Market Supply Sweet-Spot?
Go online and look at the cost of hard-drives – a 300GB (300,000MB) drive is in the $75 range, while an old 20MB drive (Meg, not Gig) costs about $140. We all understand this oddity – there is high demand for 300GB drives and virtually no demand for old legacy 20MB drives needed for repair. Even trying to buy a 40GB (Gig) drive today is hard. The market has what people call a “sweet spot” – a range of product features and capacity which is the cheapest and easiest to buy. Product builders trying to use components that are better than (or even worse-than) the market sweet-spot have disproportionally higher costs than builders using components in sync with the market sweet-spot.
The same thing happens for Ethernet components – for example buying magnetics rated at 1500v isolation (normal IEEE commercial spec) is very cheap while trying to source magnetics with 2500v isolation can cost an order of magnitude more. So while your company could define a number of electrical improvements for an “Industrial Ethernet” interface, you have to weigh this against the added cost and supply headaches of buying against the grain – of ignoring the gigantic “sweet-spot” for commercial-grade Ethernet components that enable creation of that $9.95 PC Ethernet adapter.
What is Your Manufacturing Sweet-Spot?
Just as the world market has a sweet-spot, so does your own in-house production; just ask your purchasing department. Adding Ethernet is NOT just the cost of adding a few new chips - the NIC, MAC/PHY, magnetics, and RJ45 connector. You may need to upgrade your whole basic hardware design away from a simple 8-bit CPU with 64KByte of memory to a 16 or 32-bit CPU with several MByte of memory. For example, Digi’s basic Device Server platform has a 32-bit CPU, 4MB flash, and 8MB RAM. Few of our products really need this much horsepower, but putting for example 8MB of RAM into all products is cheaper given purchasing logistics and reliability of supply than buying a mix of 2, 4, and 8MB chips. In fact, today we are looking at the cost tradeoff in shifting the basic design from 8MB to 16, 32, or even 64MB. Yes, 16MB (or 64MB) will cost more than 8MB, but given some products need 16MB (or 64MB) there are both tangible and intangible benefits to moving a larger volume of products up the curve to retain supply-chain advantages. This is especially true of FLASH and RAM chips which frequently suffer feast-and-famine availability cycles.
All small companies quickly learn – often the hard way – that during market shortages, it is the small volume purchases that get last delivery. During a chip famine low-volume purchasers will NOT be able to buy sufficient chips at any price to maintain their production. The higher your volume of a part, the lower is your price and perhaps more importantly the more reliable is your supply. So when you start to add Ethernet products and reduce sales of non-Ethernet products, you may find you need to upgrade the CPU design of some your non-Ethernet products to gain or retain reliability of parts supply.
How Robust is Commercial-Grade Ethernet?
So far I have been saying that trying to create special Ethernet hardware for industry may be costly and not very cost-effective. Worse, your average commercial-grade Ethernet is already very robust when compared to RS-232, RS-485 or USB serial. Ethernet uses a transformer-isolated signal with differential pairs, plus has nice, low-level, hardware-supported error detection. Given the high signal frequency, low signal voltage and isolation transformer, trying to add extra surge protection greatly complicates product ground design and weakens the signal, shortening the supported cable length below the 100m length customers have etched within their minds. So trying to boost your Ethernet spec for an industrial design gives questionable gain for the extra cost and lost profits. Plus your customers won’t likely perceive a market differentiation that they are willing to pay for if you say you have better isolation, etc.
A Note on Shielded Ethernet Cables:
Many industrial users start out assuming STP (Shielded Twisted Pair) is better than UTP (Unshielded Twisted Pair) for Ethernet. Oddly enough, STP has proven a bit like ABS brakes in private automobiles; despite lots of hoopla about saving lives when the US government forced ABS brakes into cars, insurance industry records continue to show it has had no measurable impact in real world road deaths. It seems while an expert driver can be helped immensely by ABS, your average idiot or careless driver still reacts to skidding situations in ways ABS brakes cannot fix.
The same appears true for STP cables and Ethernet. I have seem many discussions where industrial users tried STP cables and found the system only works reliably when they lay temporary UTP cables across the weld-shop floor! I suspect the main problem is traditional IT groups have used and measured STP success in terms of preventing Ethernet cable emissions affecting other equipment. This is not the same as using STP to prevent external interference from affecting the Ethernet signal. So ignoring the issues old truisms of a floating shield is worse than no shield and a shield grounded at both ends and creating a ground loop is worse than no shield, it appears that only experts and a very detailed system design results in STP Ethernet working better than UTP. My recommendation is to use optical fiber whenever you really worry about noise interfering with UTP Ethernet.
Vibration and RJ45:
Field tests of RJ45 connectors have shown them very bad in areas of high vibration. This is actually very easy to see for yourself - take any RJ45 connector with pins facing down and wiggle it up and down. What happens? That little finger-catch / lock acts as a pivot point and you are actually scrubbing the gold-flash contacts of the connector against the socket contacts. Metal-against-Metal; quess the result. Tests on industrial robot arms have shown even high-quality gold plated RJ45 connectors self-destruct in months or even weeks. If you expect vibration, better look for alternative connectors - such as any of the many (way too many) IP67 locking designs.
Industry and CAT 5, 5e, and 6:
Another insteresting twist to the commercial evolution of Ethernet is tests of bulk cable shows that CAT 5 is the likely the best for industrial use where the noise rejection properties of the twisted pair (differential signal) is desired. This is because - so I have been told - one of the tradeoffs IEEE allowed for CAT 5e and 6 is to allow less consistancy within the wires of a pair. After all, few Ethernet systems ever see serious external interference, so things which improved speed outweighed things which reduced noise rejection in abnormally high noise conditions. Several large automation companies tried to bring up the idea of a CAT 5i with IEEE which emphasised better noise rejection and special jacket plastic, however ... it appears it went no where. If the big computer, networking, and cable vendors don't see the value, it cannot happen through IEEE.