Automated backups for Windows Azure SQL Database

One of the questions that I've often been asked is about how you can backup databases in Windows Azure SQL Database.

What we have had access to was the ability to export a database to a BACPAC. A BACPAC is basically just a zip file that contains a bunch of metadata along with a set of bcp files for each of the tables in the database. Each table in the database is exported one after the other, so this does not produce a transactionally-consistent backup at a specific point in time. To get a transactionally-consistent copy, you need a database that isn't in use.

The easiest way to get a database that isn't in use is to use CREATE DATABASE AS COPY OF. This creates a new database as a transactionally-consistent copy of the database that you are copying. You can then use the export options to get a consistent BACPAC created.

Previously, I've had to automate this process by myself. Given there was also no SQL Agent in Azure, I used a job in my on-premises SQL Server to do this, using a linked server configuration.

Now there's a much simpler way. Windows Azure SQL Database now supports an automated export function. On the Configuration tab for the database, you need to enable the Automated Export function. You can configure how often the operation is performed for you, and which storage account will be used for the backups.

It's important to consider the cost impacts of this as well. You are charged for how ever many databases are on your server on a given day. So if you enable a daily backup, you will double your database costs. Do not schedule the backups just before midnight UTC, as that could cause you to have three databases each day instead of one.

This is a much needed addition to the capabilities. Scott Guthrie also posted about some other notable changes today, including a preview of a new premium offering for SQL Database. In addition to the Web and Business editions, there will now be a Premium edition that has reserved (rather than shared) resources. You can read about it all in Scott's post here:

Should IT professionals learn to type? – Investing in yourself

Why learn?

I was listening today to one of Scott Hanselman’s awesome podcasts with Miguel de Icaza and during the show they discussed things that you really should invest in. One list that came up was a good bed, a good chair, and to learn to type. Most of us spend 1/3 of the day in a bed and 1/3 of the day in a chair, so they seem like no-brainers. Typing is an interesting addition to that list.

Back when I was in school, in those sexist old days, typing wasn’t something that the boys learned. It was something that was taught to the girls, and mostly to those that weren’t heading in an academic direction. Up until about 1990 I was a pretty fast two-finger typist (probably more like a four-finger typist). I could cut code pretty quickly and I wasn’t too concerned about it. Occasionally though, I kept feeling that I could do much better if I learned to type. Eventually I convinced myself that if a sizeable percentage of the population could do it, surely I could learn.

Is it hard to learn?

At that time, some pretty decent typing programs for the PC had started to appear. I decided to try to learn to type and chose TypeQuick. It’s interesting (and good for them) to see that they still exist today. So many applications from those days have long since gone. I particularly liked the way that it constantly pushed you just a little bit faster than what you were achieving. I haven’t tried the program since then but I hope it’s just improved over time. But regardless, there are many of these types of programs around.

But then the real problems began. Most of these types of programs assume that you can spend an hour a day learning to type, and in a few weeks you’ll be able to do so. The implicit assumption though is that’s the only time that you’ll be typing during that time. The hassle with this is that if you type for a living (by cutting code or interacting with IT systems), spending an hour a day doing it the right way, then the rest of the day doing it the wrong way isn’t a formula for success.

I decided that I needed to break that impasse. Early one fine Sunday morning, I started doing the course and I kept going until I had pretty much finished the course, at around midnight. By then I could barely move my hands at all but I had learned all the basics.

What I hadn’t figured on though was that the hardest part was to come. During the next four weeks or so, I had tight deadlines that I knew I could meet by typing my old way, and yet I had to force myself to type correctly, even though it was at a fraction of the speed that I could do the wrong way.

I did force myself to put up with the stress and after about four weeks I was back to about the same typing speed. The brilliant part is that I was then able to keep getting faster and faster. I’d probably now average about three or four times faster than what I could do in my heyday of four-finger typing.

What difference does it make?

Clearly if you are writing a bunch of code, you can do so much more productively but there are also many subtle improvements in what you achieve, not just how quickly:

  • You are much more likely to use clear (but often longer) names for objects
  • You are much more likely to write documentation while coding (Miguel noted that people who can’t type tend to always put off tasks like documentation and I have to say that I’ve seen the same)
  • You are much more prepared to throw away code and replace it with better quality code (I’ve noticed that people who don’t type are much more likely to cling onto their first attempts at any piece of code)
  • While it’s possibly an even more minor point, you will simply seem more professional to those that you work with. Professionals invest time in skills that matter in their profession. If you’re reading this, chances are that typing is in that category for you.

I’m really surprised that typing isn’t considered a core skill for IT people today. Perhaps there will be a time in the future when it won’t matter but it does matter today. Curiously though, it’s often viewed as a less-important skill to master. But those who have done so will all tell you how important it is.

When I worked at a university, I was pushing for students to learn typing in the first two weeks of their Computer Science degrees. The idea was almost mocked by most lecturers as typing wasn’t something that was taught by universities. Yet the same university struggled to have enough resources for students to use, and the resources they had were tied up by people that took forever to type anything. The real irony is that in most courses, there isn’t much other useful work that the students could do in the first week or two anyway.

So, if you have made it to this point in the blog, and you can’t yet type, I hope that you’ll consider it.

SQL Server and Programming Frameworks

I have days where I can’t decide if I’m frustrated or sad about how I see SQL Server being used by applications, or if I’m happy that this keeps us in ongoing work.

Today I’ve been looking at a system that’s having performance issues. There are three key applications on the system. Each comes from a different vendor and when I look at how each one of them interacts with the server, it’s really frustrating. I’ve come to the conclusion that it’s mostly due to application frameworks that are being used. Here’s an example:

System A that I’m working with (name hidden to protect the guilty) never just issues a query to SQL Server. What it does instead is:

1. Issues a SET FMTONLY ON, then sends the command, then SET FMTONLY OFF
2. Creates a static cursor based on the command
3. Fetches from the cursor
4. De-allocates the cursor

This simply isn’t sensible. Apart from just slowing down the data access immensely, here are my issues with this:

SET FMTONLY ON returns the columns and data types for every code path through the code that you submit to it. That works ok for simple commands but is fragile for anything else. Regardless, I can only presume that it’s performing step #1 as a way of creating an empty dataset/recordset/rowset in the programming language, without having to specify the columns and data types. This is concerning for multiple reasons. First is that SET FMTONLY ON is now deprecated but more importantly, it means that the system is being constantly asked to evaluate something that isn’t actually changing. On the system I was monitoring, the application was asking SQL Server to do this over 60,000 times per hour, yet the answer isn’t going to change. And it is probably occurring just to simplify the creation of the rowset in code (i.e. a one-off one-time effort).

Creating a static cursor involves executing the query and materializing a copy of all of the data that was retrieved into tempdb. For almost all of the queries in question, tempdb has no need to be involved at all. The fetch command then retrieves the data from tempdb. The de-allocation then removes the copy of the data and the cursor structure from tempdb.

It’s a tribute to the product that it performs as well as it does, given how it’s often used or (more likely), abused.

Is there more to using SQL in Azure than redirecting your connection string?

At SQL Down Under, we've been working quite a lot over the past year with customers that are moving some of their applications to cloud-based systems, and mostly on Windows Azure. One message that I often hear about using Windows Azure SQL Database (WASD) is that all you need to do is point your application’s connection string to the cloud and all will be good. While there are occasional cases where that is true, that generally isn’t going to give you a great outcome. To really get a great outcome, you generally will need to check out how your application has been designed.

Here are the most common issues that I see:

1. Latency cannot be ignored. If you have ever run an application locally that’s connected to a database hosted anywhere else, you’ll realize that network delays can impact you. In Australia, we are eagerly awaiting the availability of local Azure datacenters as they will make a big difference regarding latency. But even when local datacenters are available, your customers might not all be in the same region.

When you are choosing a datacenter to use, it’s important to check the latency that you are experiencing. The easiest way to do that is to connect to a database in each datacenter using SQL Server Management Studio (SSMS), enable client statistics (from the Query menu, choose Include Client Statistics), then execute a query that will have almost no execution time, such as:


When the query is complete, on the Client Statistics tab, look at the value for “Wait time on server replies”. Make sure that you test this against other datacenters, not just the ones that seem geographically closest to you. The latency depends upon the distance, the route, and the Internet Service Provider (ISP). Last year I performed some testing in Brisbane (Australia), and the lowest latency was from the Singapore datacenter. However, testing from Auckland (New Zealand) showed the lowest latency when using a US-based datacenter, for the ISP that my customer was connected with.

2. Round Trips must be avoided. When using a remote datacenter rather than a database on your local network, it’s really important that you achieve as much work as possible in each call to the database engine. For example, last year I was working at a site where I noticed that an application was making 90,000 remote procedure calls between when the application started up and when the first window appeared for the user. It’s a tribute to SQL Server that this was pretty quick (around 30 seconds) on the local network. However, if the connection string for that application was pointed to a database server with 100ms latency, the first screen of the application would take nearly 3 hours to appear!

Years ago, I used to see the same issue with developers building web applications. Often, they would be developing the applications on the same machine that was hosting both the web server and the database engine. The applications appeared to be fast, but when deployed they could not cope with the user load. The best trick I ever found for that was to make the developers connect to the database via a dial-up 64KB/sec modem. They then realized where the delays were occurring in their code. If the application ran OK on a dial-up link, it then ran wonderfully on a “real” network. Based on that logic, it might be interesting to get your developers to work against a datacenter that has really high latency from the location they are working at. Alternately, consider using a tool that allows you to insert latency into network connections during development.

The most important lesson, however, is to work out how to reduce the number of round-trips to the database engine. For example, rather than making a call to start a transaction, another call to write an invoice header row, five more calls to write the invoice detail lines, and yet another call to commit the transaction, use techniques like table variables to let you send the entire invoice in a single call, and to avoid having your transactions ever spanning the network calls. Even better, see if you can send multiple invoices at once where that makes sense.

3. Code Compatibility needs to be considered. While WASD offers good capability with SQL Server, there are some differences. You need to check that your code is compatible. For example, WASD requires a clustered primary key on tables.

4. Backup and recovery are still important. The team at Microsoft do a great job of looking after your data. I have visited a lot of companies over the years. One of the concerns that I hear expressed about cloud-based systems is about how well the data will be managed. Ironically, from what I’ve seen and experienced of the Azure processes, on their worst day they’ll do a better job than almost any company that I visit. However, that doesn’t mean that you don’t need to be concerned with availability and recovery. You still need to put processes in place to perform periodic backups, and to have recovery plans in place.

5. Data and schema migration might need to be done differently. I recently took a small on-premises database that was about 14MB in size, scripted out the schema and the data and wanted to move it to the cloud. If I just pointed the script to WASD, it would have taken several hours to run. I found similar issues with using SQL Server Integration Services (SSIS). Once again, latency was the killer. However, exporting the database to a BACPAC in Azure Storage and then importing the database using the Azure portal took a total of about 2 minutes!

While your existing skills are important when working in these environments and in many cases will still work, there are often quicker more effective new ways to get things done. So keep learning! You might also find some interesting insights in the SQL Down Under podcast (show 51) that I did with Conor Cunningham from the SQL product team.

SQL Down Under podcast 60 with SQL Server MVP Adam Machanic

I posted another podcast over the weekend. Late last week, I managed to get a show recorded with Adam Machanic. Adam's always fascinating. In this show, he's talking about what he's found regarding increasing query performance using parallelism. Late in the show, he gives his thoughts on a number of topics related to the upcoming SQL Server 2014.


The show is online now: