Most cloud applications, from Gmail to Salesforce, implement what we can call a business data model, where applications work with data stored in a database. Translated to cloud computing, that means the data sits somewhere in the cloud, and we access it through a web page. For example, when we hit the Send button in Gmail, the text we’ve written goes into the cloud, gets stored there, and a copy gets sent along to the recipient. To read an email that someone has sent to us, we cick on the title and receive a copy in our browser. The data is stored in the cloud, and retrieved as necessary. If it takes a half-second or so for the email to open, or even a couple of seconds, nobody really minds.
A real-time system needs to respond much more quickly, and it needs to be automatic. For example, a process control system in a factory can send out data 10 times per second, or faster, and there may be hundreds of sensors, motors, and other instruments sending data simultaneously. An engineer on the receiving end doesn’t want to click a button to get the latest figures. He needs the data to arrive automatically, and as close to real time as possible.
To make it automatic, there are two options. Using “polling”, the engineer’s system would automatically send a message every so often asking, “What are the new values?” and the cloud system would respond with the latest values. But the amount of overhead in time and bandwidth used for sending the question and processing the result rules out this option for all but very slow connections. You can imagine how much waste of processing power it would be to receive and calculate a response to that same question, over and over, 10 times per second or more.
A better option is “publish/subscribe”, where the engineer’s system sends this request just once: “Tell me any new value whenever it changes.” From then on, the cloud system would just “push” the newly updated values to the engineer’s system. This option significantly reduces both the amount of network traffic and the processing time on the cloud server.
How the data is maintained on the cloud server is also an issue. A data communication system following a non-polling, publish/subscribe model will slow to a crawl using the kind of database commonly found in most business systems. A relational database is just not designed to handle such high volume of reads and writes. In addition, the data structures that such a database creates are not necessary. They take too much time to build, and they use too much bandwidth to transmit. A real-time system typically receives and retransmits data through an in-memory database in a very simple format to keep latency as low as possible.
Implemented properly, a publish/subscribe data communication system can keep data transmission latency at or near the latency of the network itself. This is essential if we want to get anywhere near the goal of establishing real-time data communication in the cloud. However, the task of pushing data to the cloud and then on to the client requires a unique approach to the client/server relationship in a web-based system. We’ll talk about that next week.