Dark Clouds
Cloud computing: Threat or Menace?
I did some sustainability consulting recently for a major computer company. We focused for the day on building a better understanding of their energy and material footprint and strategies; during the latter part of the afternoon, we zeroed in on testing the sustainability of their current business strategies. It turned out that, like many big computer industry players, this company is making its play in the "cloud computing" field.
("Cloud computing," for those of you not up on industry jargon, refers to a "a style of computing in which resources are provided “as a service” over the Internet to users who need not have knowledge of, expertise in, or control over the technology infrastructure." The canonical example would be Google Docs, fully-functional office apps delivered entirely via one's web browser.)
Lots of big companies are hot for cloud computing right now, in order to sell more servers, capture more customers, or outsource more support. But there's a problem. As the company I was working with started to detail their (public) cloud computing ideas, I was struck by the degree to which cloud computing represents a technical strategy that's the very opposite of resilient, dangerously so. I'll explain why in the extended entry.
But before I do so, I should say this: A resilient cloud is certainly possible, but would mean setting aside some of the cherished elements of the cloud vision. Distributed, individual systems would remain the primary tool of interaction with one's information. Data would live both locally and on the cloud, with updates happening in real-time if possible, delayed if necessary, but always invisibly. All cloud content should be in open formats, so that alternative tools can be used as desired or needed. Ideally, a personal system should be able to replicate data to multiple distinct clouds, to avoid monoculture and single-point-of-failure problems. This version of the cloud is less a primary source for computing services, and more a fail-safe repository. If my personal system fails, all of my data remains available and accessible via the cloud; if the cloud fails, all of my data remains available and accessible via my personal system.
This version of cloud computing is certainly possible, but is not where the industry is heading. And that's a problem.
For big computer companies, the cloud computing model breathes new life into the centralized server markets that were once their bread-and-butter, as they offer high profits on sales and service contracts. Cloud computing doesn't just use a server to store and transfer files, it uses the servers to do the hard computing work, too, in principle making your personal machine little more than a fancy dumb terminal. Companies that already have significant server and bandwidth space, such as Amazon and Google, love the idea because it offers them more ways to lock users in to proprietary formats and utilities. For many of the corporate users looking at cloud services, that's a worthwhile trade-off to avoid having to deal with continuously expanding IT expenditures. Let the cloud companies worry about the software and hardware upgrades; all we need to handle are the dumb terminals.
Cost-effective, perhaps. But by no means resilient.
Recall that the core premise of a resilience strategy is that failure happens, and that the precise mode of failure can't necessarily be predicted. Resilience demands that we prepare for unexpected problems so as to minimize actual disruption -- minimize in terms of time, but particularly in terms of how widespread the disruption may be.
Resilience design principles include: Diversity (or avoidance of monocultures); Redundancy; Decentralization; Transparency; Collaboration; Graceful Failure; Minimal Footprint; Flexibility; Openness; Reversibility; and Foresight. As per Jim Moore's comments on this post, we should add "Spare Capacity" to the list.
How does cloud computing match up?
On the positive side, the standard (Google Apps) model for cloud computing does well with collaboration, reversibility, and (arguably) spare capacity. While the collaboration and reversibility aspects of these apps could likely be replicated with standard desktop software, they're definitely intrinsic to the cloud approach. These are fundamental to the appeal of the cloud model.
Conversely, cloud computing clearly falls well short in terms of diversity, decentralization, graceful failure, and flexibility; one might also include redundancy, transparency, and openness on the negative list.
Here's where we get to the heart of the problem. Centralization is the core of the cloud computing model, meaning that anything that takes down the centralized service -- network failures, massive malware hit, denial-of-service attack, and so forth -- affects everyone who uses that service. When the documents and the tools both live in the cloud, there's no way for someone to continue working in this failure state. If users don't have their own personal backups (and alternative apps), they're stuck.
Similarly, if a bug affects the cloud application, everyone who uses that application is hurt by it. As the cloud applications and services become more sophisticated (well beyond word processors and spreadsheets), the ability to pull up an alternative system to manipulate the same data becomes far more difficult -- especially if the failed cloud application limits access to stored content.
Flexibility suffers when one is limited to just the applications available on the cloud. That's not much of a worry right now, when most cloud computing takes place via normal laptops and desktop computers, able to load and run any kind of application. It's a greater problem in the future envisioned by many cloud proponents, where people carry systems that provide little more than cloud access.
There's also the issue of how well it fares when network access is spotty or degraded.
In short, the cloud computing model envisioned by many tech pundits (and tech companies) is a wonderful system when it works, and a nightmare when it fails. And the more people who come to depend upon it, the bigger the nightmare. For an individual, a crashed laptop and a crashed cloud may be initially indistinguishable, but the former only afflicts one person and one point of access to information. If a cloud system locks up, potentially millions of people lose access.
So what does all of this mean?
My take is that cloud computing, for all of its apparent (and supposed) benefits, stands to lose legitimacy and support (financial and otherwise) when the first big, millions-of-people-affecting, failure hits. Companies that tie themselves too closely to this particular model, as either service providers or customers, could be in real trouble. Conversely, if the big failure hits before cloud has swept up lots of users and visibility, the failure could be a signal to shift towards a more resilient model.
I would love to use the resilient cloud described above, and I suspect I'm not alone. But who's going to provide it?
Comments
Interesting, but I'm not going to break into a cold sweat just yet.
Client lock-in always has been the aim of most commercial 'closed' systems. It just doesn't work in a common environment. People are slowly waking up to this. (Are you aware that a recent ruling requires that software patents meet a much more stringent test?)
Cloud only devices would need to be *really* cheap and portable to win over lite PCs and USB sticks.
Centralisation would be a resilience issue if the 'server' was a single physical unit in a single location. There is no reason for this to be so. (twenty or so years ago, the distributed file system of POSIX actually catered for it)
But, yes. Location! Location! Location!
Many years ago, I worked on a telecommunications system that touted itself as having triple redundancy backup.
All well and good, until I spent an evening doing an on-site upgrade... and there were the 'triple redundant' systems, sitting in a row, on a flimsy shelf supported by a couple of bits of 4x4!
(A nation's communications hung in the balance while I overcame the inner imps...)
Posted by: Tony Fisk | January 19, 2009 7:03 PM
I wasn't just referring to physical security, of course. Distributed denial-of-service attacks don't care where you're locating your servers.
Posted by: Jamais Cascio | January 19, 2009 7:12 PM
Jamais, speaking of DoS attacks. Is there anything we can do with the protocol itself to handle noise in the system? Put an eCondom on the end of it?
Posted by: Michael P. Gusek | January 19, 2009 8:02 PM
Great article. Most people would look at this from the perspective of customers or services providers. You've added a system point of view. But for customers consideration, there are still many legitimate reason for cloud computing. The IT expenditure of building and maintaining their own application is very high. Also they may find themselves lacking the expertise compare to more sophisticated service providers. This is like generating electicity for themselves. Why tackle the engineering challenge of building a reliable energy supply from solar panels and generators, when you can just buy it from the grid more cheaply? The systemwise downside is of course there are blackouts.
Posted by: Wai Yip Tung | January 20, 2009 4:01 PM
Interesting article, Jamais! Do you know about the Google Apps Offline feature? I think that this feature helps Google to follow the decentralization and redundancy principles.
Posted by: Bayram | January 21, 2009 8:50 AM