At SQL Down Under, we've been working quite a lot over the past year with customers that are moving some of their applications to cloud-based systems, and mostly on Windows Azure. One message that I often hear about using Windows Azure SQL Database (WASD) is that all you need to do is point your application’s connection string to the cloud and all will be good. While there are occasional cases where that is true, that generally isn’t going to give you a great outcome. To really get a great outcome, you generally will need to check out how your application has been designed.
Here are the most common issues that I see:
1. Latency cannot be ignored. If you have ever run an application locally that’s connected to a database hosted anywhere else, you’ll realize that network delays can impact you. In Australia, we are eagerly awaiting the availability of local Azure datacenters as they will make a big difference regarding latency. But even when local datacenters are available, your customers might not all be in the same region.
When you are choosing a datacenter to use, it’s important to check the latency that you are experiencing. The easiest way to do that is to connect to a database in each datacenter using SQL Server Management Studio (SSMS), enable client statistics (from the Query menu, choose Include Client Statistics), then execute a query that will have almost no execution time, such as:
When the query is complete, on the Client Statistics tab, look at the value for “Wait time on server replies”. Make sure that you test this against other datacenters, not just the ones that seem geographically closest to you. The latency depends upon the distance, the route, and the Internet Service Provider (ISP). Last year I performed some testing in Brisbane (Australia), and the lowest latency was from the Singapore datacenter. However, testing from Auckland (New Zealand) showed the lowest latency when using a US-based datacenter, for the ISP that my customer was connected with.
2. Round Trips must be avoided. When using a remote datacenter rather than a database on your local network, it’s really important that you achieve as much work as possible in each call to the database engine. For example, last year I was working at a site where I noticed that an application was making 90,000 remote procedure calls between when the application started up and when the first window appeared for the user. It’s a tribute to SQL Server that this was pretty quick (around 30 seconds) on the local network. However, if the connection string for that application was pointed to a database server with 100ms latency, the first screen of the application would take nearly 3 hours to appear!
Years ago, I used to see the same issue with developers building web applications. Often, they would be developing the applications on the same machine that was hosting both the web server and the database engine. The applications appeared to be fast, but when deployed they could not cope with the user load. The best trick I ever found for that was to make the developers connect to the database via a dial-up 64KB/sec modem. They then realized where the delays were occurring in their code. If the application ran OK on a dial-up link, it then ran wonderfully on a “real” network. Based on that logic, it might be interesting to get your developers to work against a datacenter that has really high latency from the location they are working at. Alternately, consider using a tool that allows you to insert latency into network connections during development.
The most important lesson, however, is to work out how to reduce the number of round-trips to the database engine. For example, rather than making a call to start a transaction, another call to write an invoice header row, five more calls to write the invoice detail lines, and yet another call to commit the transaction, use techniques like table variables to let you send the entire invoice in a single call, and to avoid having your transactions ever spanning the network calls. Even better, see if you can send multiple invoices at once where that makes sense.
3. Code Compatibility needs to be considered. While WASD offers good capability with SQL Server, there are some differences. You need to check that your code is compatible. For example, WASD requires a clustered primary key on tables.
4. Backup and recovery are still important. The team at Microsoft do a great job of looking after your data. I have visited a lot of companies over the years. One of the concerns that I hear expressed about cloud-based systems is about how well the data will be managed. Ironically, from what I’ve seen and experienced of the Azure processes, on their worst day they’ll do a better job than almost any company that I visit. However, that doesn’t mean that you don’t need to be concerned with availability and recovery. You still need to put processes in place to perform periodic backups, and to have recovery plans in place.
5. Data and schema migration might need to be done differently. I recently took a small on-premises database that was about 14MB in size, scripted out the schema and the data and wanted to move it to the cloud. If I just pointed the script to WASD, it would have taken several hours to run. I found similar issues with using SQL Server Integration Services (SSIS). Once again, latency was the killer. However, exporting the database to a BACPAC in Azure Storage and then importing the database using the Azure portal took a total of about 2 minutes!
While your existing skills are important when working in these environments and in many cases will still work, there are often quicker more effective new ways to get things done. So keep learning! You might also find some interesting insights in the SQL Down Under podcast (show 51) that I did with Conor Cunningham from the SQL product team.
8 thoughts on “Is there more to using SQL in Azure than redirecting your connection string?”
#6. Error handling
Do yourself a favour and implement the Transient Fault Handling Application Block (http://msdn.microsoft.com/en-us/library/hh680934(v=PandP.50).aspx) in both your on-premises and cloud solutions.
In the cloud, you should expect connections to be dropped and commands that need to be retried.
On-premises, it can help your application survive a connection reset after a cluster failover or and AlwaysOn Availability Group failover.
Hi Grant, while I agree that it's needed, I don't see fault tolerance as just an Azure-related issue. I regularly visit organizations where they've implemented highly-available systems (and spent a fortune on them), yet as soon as they failover (like they're designed to do), most of the applications in the building break. That's not a good story on or off premises.
+1 to GrantH. My company went from onsite to Azure last year without implementing Transient Fault Handling Application Block, and there are days where we are bombarded by error emails due to dropped connections. To my knowledge, we never once had that problem when we were hosting the database.
To be honest, when I saw this article, I assumed fault handling would be at least in the top 2; while I understand Greg's point about this being non-SQL Azure specific, it's certainly a bigger concern when going off-site in my experience, so it becomes even more pertinent to ensure that the code base can handle it.
My understanding is that ADO .NET in .NET 4.5.1 has been updated to handle transient faults somewhat transparently.
Yes, I'm keen to do some testing on that. It's called "Idle connection resilence" in the documentation. While that no doubt will help to keep a connection alive, there are many other aspects to achieving reliable transaction processing.
Nice work Greg. Very good advice
I stopped reading when I saw you rip off WASD in a non-keyboard related way.
What a bizarre comment. How can anyone "rip off" any acronym? Most acronyms are overloaded. My guess is that in just two years from now, WASD will be associated with "Windows Azure SQL Database" more than it will be associated with WASD keyboards.