The industry is clearly trending quite quickly towards Software as a Service (SaaS) applications. Rather than building monolithic chunks of code, new applications are often constructed by combining a variety of platform services, themselves usually delivered as Platform as a Service (PaaS) offerings.
Any application layers that you build above these services though, are only as good as the underlying services. And that's where things can go very, very wrong quite quickly.
I was at a software house recently where the management that I talked to said they couldn't ever be offline for more than about four hours. In fact, anything more than two hours would be a problem. The IT people at the same place told me that backups of their primary database were taking over eight hours, and that restores would be longer. I'm often left wondering if these groups of people within the same company even talk to each other. Clearly there was an expectation gap.
I was also recently working at another ISV (Independent Software Vendor) that was looking to provide a SaaS offering, and were offering their end customers an SLA (service level agreement) that said they'd always be back up and running within 4 hours. But the data centers that they were depending upon had an SLA showing that loss of a region could involve an outage of one full week. (And the customer data could not go to another region)
What makes this worse is the current trend for many of these services to be impenetrable by phone or for anything urgent.
One of our suppliers had a major outage last week because one of their own suppliers (NameCheap) had decided (incorrectly) to disable their DNS entry because of spam reports. So our supplier was offline, and could do nothing except email NameCheap's support team and hope they would respond soon. They told us that there was no way for them to call NameCheap.
But even if that's the case, NameCheap isn't alone on this. It's a common trend. If you are building any sort of SaaS offering though, you need to realize that you are only as good as your weakest link (or in this case, SLA).