In the sessions that I was attending in Redmond yesterday, I was interested to see the presenter discussing the idea of having individual database files in the cloud as some sort of bizarre option that’s now available in SQL Server 2014. I don’t see it as bizarre at all. However, I see two distinct use cases that make complete sense to me:
Overcoming Drive Limitations in Azure-based SQL Virtual Machines
One of the current limitations of running SQL Server in Azure virtual machines is that you can only attach two additional data drives per CPU core, and each of those drives is up to 1 TB in size. For example, if you spin up a two core VM, you can attach 4 data drives.
By allocating URLs for data drives, you can overcome this limitation. In a recent whitepaper that I was a reviewer on, we discussed the IOPs and size limitations of Azure storage based drives, both at the individual drive level and the storage account level. Please read that whitepaper for more info on those characteristics that remain unchanged. A core takeaway though is that you shouldn’t mix storage accounts for files that make up a single database, to make sure that you maintain write ordering. You could use different storage accounts (to overcome single account limits) for different databases though.
Tiered Storage for Partitioned Tables in On-Premises Databases
I commonly implement partitioned tables for sliding window scenarios. For example, a customer might want to keep 7 years of data online at any time. If table partitioning isn’t familiar to you, please read another whitepaper that I was a reviewer on. Even though it was created for SQL Server 2008, it’s pretty much still spot on.
One way that I use partitioned tables is to keep active data in the current partition, and to keep older data in separate partitions, often one per month. This usually allows me to have the older data in read-only filegroups. This gives you a great situation where your application just sees a normal table, but most of the table is read-only. There are several advantages of this, some of which are:
– While locks are still acquired on the read-only data, hints like NOLOCK are then safe to use on that data.
– You won’t have a query accidentally overwrite your older data.
– The older data could be stored on slower media.
– The older data could have a higher compression rate. For example, you could use ROW compression for the active data, and PAGE compression for the older data.
– SQL Server doesn’t need to run recovery on read-only filegroups.
Using data files in Azure storage could make really good sense for some of this older data. You could store your old read-only data in storage files and keep it available. So your current data would be fast to access but your older data would still be present, but slower to access.
I could see a situation where you might never need to remove your old data.
One other question that comes up in this situation relates to how backups work. Clearly you don’t want to be pulling all your data out of the cloud when you perform a backup, particularly if you are going to also backup to the cloud. However, piecemeal backups in SQL Server are perfect for this. If you just store your read-only filegroups and archive data in Azure storage files, you can safely perform a backup with READ_WRITE_FILEGROUPS_ONLY.
All in all, I see that some of these new hybrid scenarios could be quite compelling, and not bizarre at all.