New SQL Server Spatial Course

I’ve always thought that the support for spatial data types that was added in SQL Server 2008 was one of the best parts of that release.

In SQL Server 2012, a great set of enhancements were made to the spatial support.

We’ve had lots of interest lately in SQL Server spatial so we’ve been working on a new course that targets it specifically. We’re pleased that we’ve now added a brand new one day SQL Server Spatial Core Skills course to our list of regularly scheduled courses. It covers a wide variety of topics from understanding the data types, to loading data, geolocation, indexing, etc. It also covers the enhancements that were made in 2012.

The first public schedule for this course is April 14th in Melbourne. Details are here:

Fix: Incorrect status output in the SSIS Data Quality transform

I hear comments from people all the time that say it’s not worth posting details on the Connect website as they don’t get actioned anyway. While it can seem a frustrating process, and often take quite a while, improvements do come from there.

At a SQL Saturday event in Brisbane, I was speaking to an attendee (Ian Roberts) who was complaining that DQS was returning what he perceived as the wrong status. More importantly, the results from using the DQS client differed from using the SSIS transform. He described it this way:

He set up a domain for Australian States and one for Australian suburbs, based on files that he had downloaded from the Australia Post website. He then created a Data Quality Project to test this. In that tooling, the results were as expected. Hobarrt got transformed to Hobart; Victoria got transformed to VIC; Melburn, New York and NWS all got tagged as New as they couldn't be found. With the Record Status, New seemed to override Correct, and Corrected seems to override them all.

However, in the DQS Client Transform, since Correct overrides New, unless every field was in error, the record would go to the Corrected output.

The problem with this is that in the SSIS conditional split transform, you should just be able to test the Record_Status to determine what to do with the record, rather than look at the status for each field. Looking at the status for each individual field, it would make the conditional split transform unwieldy.

The other real issue is that the two tools should provide the same outcome. You should be able to look at the Record_Status, direct correct and incorrect records to their appropriate destination, then deal with the incorrect ones after the fact. Having the field status in the output would make application error reporting more meaningful as you could nominate the actual field in error. The main issue is that Correct gazumps New in SSIS, but New gazumps Correct in the DQS Client. That’s not appropriate.

In one of my podcasts for SQL Down Under, I had interviewed Gadi Peleg from the DQS team and Gadi agreed that it looked incorrect. He agreed that the output should be:

•    If 1 of the values is Invalid – the record is invalid
•    If 1 of the values is corrected – the record is corrected
•    If 1 of the values if New – record is new
•    If all values are Correct – only then record is correct

Gadi encouraged me to log it on the Connect website.

Today I got an email to say it’s been fixed. That’s a great outcome, although the update doesn’t indicate in which version of service pack the change will apply:

Recent thoughts on learning Chinese

Over the last few days, I’ve been getting a lot of questions again about what I’ve found works and doesn’t work with learning Chinese.

Here are a summary of things I’ve tried and thoughts on them:

1. Local classes

My wife and I attended some local classes in the first year I tried to learn. That was only about 2 hours per week and while it was interesting, it’s way too little to be very meaningful.

2. ChinesePod

This is a great podcast. They concentrate on spoken Chinese and much of it is good fun. I really liked the older material with Ken and Jenny but the new hosts are getting there. They have different levels. Newcomers might find much of the Newbies level quite challenging at first, but it’s important to persist with it. The step up to Elementary is substantial but not too scary. The step up to Intermediate though, is really too big a jump. I now find Elementary too easy but Intermediate is often too hard. I can’t comment on higher levels.

The accents on ChinesePod are pretty clean but some people complain that it’s a bit too Shanghai-ish. It seems pretty good but it’s not as clean as material that I’ve used from Beijing. You can really tell the difference.

ChinesePod also have lessons that you can do daily or weekly with teachers. These are not inexpensive but I think they’d be good value for those that want to progress quickly. Having someone to talk to for 20 mins a day, 5 days a week should help enormously, and allows for the teacher to provide one-on-one help in your weaker areas. I’ve done a demo session and I might take this up soon.

3. Rosetta Stone

This is probably the best known of the traditional language learning systems. For most languages, there are 3 levels. For Chinese, there are 5 levels. Each level consists of 4 units. I’m currently half a unit from completing the 5 levels. My inlaws (Chinese) say I’ve learned an enormous amount by doing this material. I’m sure that’s true but I have to say I’ve found it very, very frustrating at times.

A surprise for most people is that all spoken and written words in the course are in your target language. Read that last statement carefully. They argue that children learn by seeing. While this concept mostly works, I could not have learned using Rosetta Stone without having Google Translate and Pleco open the whole time. There were way too many subtleties that I could not have picked up from the pictures alone. I have had many moments of “how the @#$@#$@ could I have known that???” when working with Rosetta Stone.

When I signed up, they also had a Studio product that gave you online access for about $60 each six months. That was amazing value and one of the main reasons that I used it. It let you book into online classes that were conducted directly from Beijing. There was no limit to the number of classes you could book into. Clearly this was a problem for them so about a year ago, they dramatically changed the rules to where the Studio product was no longer worthwhile in my opinion, for many, many reasons. That was a real pity as it was the best part of the product and they broke it.

4. Pleco

This is by far the best app I’ve found for using on my iPad. It helps me translate Chinese to/from English in many useful ways. I use it every day. One of the reasons that I can’t move to a Windows Phone is that it’s not available on that platform and I wouldn’t want to live without Pleco.

5. Serge Melnyk’s Podcast

This is an odd one. Serge is a Russian (yes you read that correctly) that has a podcast on learning Chinese. I’ve come to really like Serge and have signed up to his site so that I could download all the content. If you have travelling time when you can listen to podcasts, it’s well worthwhile, although the accent is quite bizarre.

6. HSK Review

This is a great app for learning the Chinese characters that are required for the HSK exams. I hope to do at least HSK 1 and HSK 2 exams this year, perhaps HSK 3 if I’m lucky.

7. AnkiApp

This is in many ways an even better app for learning HSK based characters. It keeps track of what you are getting right and wrong, and how hard you find each character. It works out what you need to work on and provides great review drills for you based on that feedback. It also then assesses what level it thinks you are at for each of the levels.

8. Google Translate

This is invaluable and much superior translation-wise to the Bing Translator (sorry Microsoft). The pinyin input editor doesn’t work on IE11 (sorry again Microsoft) but no surprise, works like a charm on Chrome. Many of the translations are “interesting” so I’m glad I have a bunch of patient Chinese speaking buddies on Facebook that help me out where I just can’t work out what something really means.

9. PeraPera Chinese Popup Dictionary

This is an add-on for Chrome (and for Firefox) that has quickly become one of my favourite things. You just click a button in the top right of the browser when you want to use it, and you just point at words or phrases in Chinese that you don’t understand and it tells you about them. Awesome idea.

10. FluentU

This is another excellent service that I’ve signed up to. It’s about $19 a month when I signed up and it produces a bunch of content each month. I particularly like the way that different levels are supported, the way that the content is often quite current, is presented in a natural way, with characters, pinyin, and English all shown.


1. PinYin

I made the mistake of spending too long with pinyin visible when I was first learning. It can become a crutch for you, and eventually you need to learn to read Chinese characters. In Rosetta Stone, you can choose what’s displayed. For the last year or so, I changed to only having Chinese characters displayed. It was VERY painful at first but I’m so glad I persevered with it. I now only turn on the pinyin display (in addition to the characters) when I don’t understand something. It’s important to force yourself to read.

Learning to pronounce pinyin correctly is also a challenge. There is no simple story here. You just need to learn the basic rules on how words sound, and after a while you will see “xin” but “shin” will come out of your mouth.

Focus a lot on listening. Some sounds take a while to get used to. Good examples are the difference between the “u” in “qu” and “zhu” or the sounds that we really don’t have in English like “nv” (The v is really a u with an umlaut). Other challenges are getting used to “c” sounding more like “ts” in words like “cai”, etc. So, listen a lot.

2. Movies and Video

Watching movies is obviously good but it’s a real challenge. At first, I struggled to find many Chinese movies that weren’t endless martial arts of medieval warriors.

My Spanish friends told me they learned English by watching movies in English but with English subtitles. This is important because you don’t want to know what something means, when learning you want to also know what they actually said. The problem with Chinese is that you have the added problem of not being able to read the characters, and certainly not at the speed that they are displayed at. (And I might be getting old, but they are really small too).

I’m told that Karaoke is good for learning. I haven’t tried it. Many people swear by learning languages by singing songs in the target language.

There is a world of good content on YouTube. I really like many of the game shows. You might as well find something entertaining while you’re learning.

3. Growing up with Chinese

CCTV has a series called “Growing up with Chinese” that is targeted at teenagers wanting to learn. There are 100 episodes that are available for download. I watch those while trying to get some exercise on a treadmill. They start out very basic but I think they move a little too fast into more complex topics. However, the host Charlotte is adorable, and the more of these things you learn from the better.


The biggest hint though is just to get out there and try to talk to native speakers. They will correct you much more quickly than you’ll ever work out yourself. I see this every day. For example, we were at my in-laws for lunch. I said to my mother-in-law “我很高兴我不需要药”, and she look puzzled then said “我很高兴我不需要吃药”. Note that I’d left a word out that helped a lot with what I meant. Only natives can do this for you.

With all the formal types of courses too, you'll quickly find that what they teach really is quite formal and you'll come to a sad reality (at some point) where you find that's just not what people say. That's why it's so important to get to common spoken language pretty quickly. For example, people often learn "ni hao" as the first phrase for "hello". After a while, you'll realise that it's not common for people to say that. "nin hao" which is more formal is pretty common in some situations though. A better example is to ask yourself how often you'd say a phrase like "good morning" or "good day". Most Australians would be more likely to say just "morning" rather than "good morning". Similarly, while you'll learn "zao shang hao", you'll often hear people just say "zao". For "bus" the courses will teach "gong gong qi che" but as an example, my brother in law says "gong che". There is no answer to this apart from talking to natives.

One source of natives that I’ve found useful is local meetup groups. There are lots of Chinese who want to learn English and are happy to chat to you back/forth with both languages, plus native speakers who are just happy to help because they get to meet new people.

Anyway, that’s it for now. If I think of more, I’ll add them later. Good luck!

Reliably Dropping a Database in a T-SQL Script is Too Hard

Recently I’ve been working on a system where I had to create a test script that reliably recreated a database each time it was executed. I’ve done this many times before and thought I knew how but I was wrong. It turns out that reliably dropping a database in a script that you run from SQL Server Management Studio (SSMS) is harder than it looks. (It’s also harder than it should be).

In Books Online (BOL), the article for DROP DATABASE says :

“You cannot drop a database currently being used. This means open for reading or writing by any user. To remove users from the database, use ALTER DATABASE to set the database to SINGLE_USER.”

Based on this, I had been using the following style of script to drop and recreate a database:

USE master;

IF EXISTS(SELECT 1 FROM sys.databases WHERE name = N'Blah')


However, what I’ve found is that when I execute this from SSMS, it doesn’t always work. Fairly randomly (and hard to reproduce), it fails with an error message telling me that the DROP failed because the database was in use. But isn’t that the whole purpose of setting it to single user?

The problem seems to be that although the database gets set to single user, another connection could be made to it before the DROP DATABASE statement occurs. Setting the database to SINGLE_USER isn’t enough as the current database context for the script is the master database, not the database in question.

A number of users and fellow MVPs have told me that they’ve experienced the same issue. What we suspect is causing this is the Intellisense system within SSMS connecting to the database to check the syntax of objects later in the same script. The problem only really seems to happen in SSMS but it could happen elsewhere.

A great suggestion from Rob Farley was to set the database OFFLINE instead of SINGLE_USER. While that would stop the other connection, the problem is that when you DROP a database that is OFFLINE, the files for the database don’t get deleted. You’d then need to have some additional script (nasty) to delete the files as well after the database was dropped.

What is really needed to get around this race condition is:


If you agree, I’ve created a Connect item here that you can vote on:

(Update 8 Sep 2019): As all the Connect items got nuked again, that link doesn't work anymore. I did just find it in the new UserVoice system under a different name here:


SQL Server 2014 Hybrid: Storing data files in Azure storage – Bizarre or not?

In the sessions that I was attending in Redmond yesterday, I was interested to see the presenter discussing the idea of having individual database files in the cloud as some sort of bizarre option that’s now available in SQL Server 2014. I don’t see it as bizarre at all. However, I see two distinct use cases that make complete sense to me:

Overcoming Drive Limitations in Azure-based SQL Virtual Machines

One of the current limitations of running SQL Server in Azure virtual machines is that you can only attach two additional data drives per CPU core, and each of those drives is up to 1 TB in size. For example, if you spin up a two core VM, you can attach 4 data drives.

By allocating URLs for data drives, you can overcome this limitation. In a recent whitepaper that I was a reviewer on, we discussed the IOPs and size limitations of Azure storage based drives, both at the individual drive level and the storage account level. Please read that whitepaper for more info on those characteristics that remain unchanged. A core takeaway though is that you shouldn’t mix storage accounts for files that make up a single database, to make sure that you maintain write ordering. You could use different storage accounts (to overcome single account limits) for different databases though.

Tiered Storage for Partitioned Tables in On-Premises Databases

I commonly implement partitioned tables for sliding window scenarios. For example, a customer might want to keep 7 years of data online at any time. If table partitioning isn’t familiar to you, please read another whitepaper that I was a reviewer on. Even though it was created for SQL Server 2008, it’s pretty much still spot on.

One way that I use partitioned tables is to keep active data in the current partition, and to keep older data in separate partitions, often one per month. This usually allows me to have the older data in read-only filegroups. This gives you a great situation where your application just sees a normal table, but most of the table is read-only. There are several advantages of this, some of which are:

– While locks are still acquired on the read-only data, hints like NOLOCK are then safe to use on that data.

– You won’t have a query accidentally overwrite your older data.

– The older data could be stored on slower media.

– The older data could have a higher compression rate. For example, you could use ROW compression for the active data, and PAGE compression for the older data.

– SQL Server doesn’t need to run recovery on read-only filegroups.

Using data files in Azure storage could make really good sense for some of this older data. You could store your old read-only data in storage files and keep it available. So your current data would be fast to access but your older data would still be present, but slower to access.

I could see a situation where you might never need to remove your old data.

One other question that comes up in this situation relates to how backups work. Clearly you don’t want to be pulling all your data out of the cloud when you perform a backup, particularly if you are going to also backup to the cloud. However, piecemeal backups in SQL Server are perfect for this. If you just store your read-only filegroups and archive data in Azure storage files, you can safely perform a backup with READ_WRITE_FILEGROUPS_ONLY.

All in all, I see that some of these new hybrid scenarios could be quite compelling, and not bizarre at all.

More updates to Azure: Which changes relate to SQL people?

The Azure team keep rolling out improvements at an amazing pace. Scott Guthrie posted recently about the latest set of changes. Here are the ones that I think are most important for SQL people:

Import/Export Hard Drives

Even though Azure storage accounts can hold a very large amount of data, one of the big questions has always been about how to get that data into the storage account in the first place. That question is now answered because the new Windows Azure Import/Export service lets you ship hard drives directly to the Azure team for uploading. The reverse is also available. If you export a large amount of data to a storage account, you can move it onto a hard drive and have the drive sent to you.

Currently the service uses FedEx for transport. If you are concerned about the security of data in transit, bit locker encryption is also supported in both directions.

HDInsight Out of Preview Mode

HDInsight has made its way out of preview and into general availability. It allows you to process large volumes of data (in Azure Storage) using Apache Hadoop tools (including Pig and Hive).

HDInsight gives you a way of rapidly (and economically) spinning up a Hadoop cluster when you need one, and shut it down again when you are done.

Virtual Machine Gallery

As the number of virtual machine images has continued to grow, it’s been getting harder and harder to find the image that you want when creating a new virtual machine. UI changes have been made to make this easier.

One of the discussions that the team has been having lately is around how long images should be kept. For example, once SQL Server 2014 ships, should images of SQL Server 2008 R2, or SQL Server 2008 also be kept? I’d like to see them all kept but I’m not sure how much ongoing maintenance that is for the team. Also, if they are all kept, should they only keep the version with the operating system that was the most current at the time? If anyone has strong opinions on this, I’d love to hear them and I’ll pass them on.

General Changes

You can now configure endpoint security using PowerShell.

You can now configure alerts related to billing. You can get an email when you bill exceeds a given threshold. One thing I’d really like to see (which isn’t there yet) is the ability to simply have your bill emailed to you each month, even when it’s prepaid. Does anyone else feel the need for that?

For further details on areas that I haven’t mentioned, you’ll find Scott’s post here:

Technet Live: What's new in SQL Server 2014

Hi Folks,

If you want to get your head around the core improvements coming in the SQL Server 2014 wave, I'm presenting a Technet Live session on November 29th (Australian time).

Registration is here:

I'd love to see you online. 

SDU Podcast #61: Dr David DeWitt

I had the distinct honour (honor) this week of recording a new SQL Down Under podcast with Dr David DeWitt from the Jim Gray Systems Lab at Microsoft. Anyone that has been to a PASS summit over the last few years would know that David seems to have single-handedly converted morning keynotes into must-attend events.

In this podcast, David explains his role, gives a philosophical view on where we're at in the database industry and discusses core enhancements in SQL Server 2014.

You'll find it here:


Webinar: SQL Server Compression Technologies

A while back, we changed the format of our monthly SQL PASS meetings to a virtual format for most meetings, as it makes it easier for a lot of people to attend.

Tomorrow (lunch time Melbourne time), I'm delivering another one on compression technologies in SQL Server. In this session, we'll take a tour through vardecimal in 2005, then onto row and page compression in 2008, then xVelocity based compression in 2012, and finally looking at what 2014 offers in this regard.

We have a limit on the number of attendees so please don't register if you can't make it but if you can, we'd love to see you online.

Latest Azure updates related to SQL (and Oracle)

Azure updates continue to come thick and fast.

Scott Guthrie has posted again today about a number of enhancements just released. From these, I think the following are the most interesting for SQL Server devotees:

1. I posted last week about the new A5 size Azure VMs. These will be a good option for many SQL VMs, particularly those based on tabular data models.

2. Microsoft continues their partnership with Oracle by providing VM templates with Oracle software preinstalled. This is by far the quickest way to try any of this software, particularly for SQL Server folk that just want to try something on Oracle versions. Versions supported are:

  • Weblogic Server 12c and 11g
  • Database 12c and 11g
  • Java Platform Standard Edition 6

3. You can now also modify VMs that are stopped, for operations like adding disks.

4. A single Azure subscription can now deal with multiple Active Directories. This is really helpful if you need to extend your on-premises AD to the cloud, particularly if you're starting to investigate the hybrid SQL Server scenarios such as using Azure for a DR site with Availability Group replicas.

5. Subscription suspension no longer deletes virtual machines either.

This is another great set of updates. I'm truly impressed by the speed that these are appearing at.