Azure SQL Database now has an improved STRING_SPLIT !

I get pretty excited about new T-SQL enhancements. Back in 2016, I was so pleased to see Microsoft finally add a string split option to T-SQL, but my enthusiasm was limited when I saw how it was implemented. Now that's mostly fixed !

While it's possible to build functions that did string splitting just like I wanted, the problem is that no matter how you implemented them, they were really just too slow. And most importantly, much slower than a built-in function would be.

The Issues

SQL Server 2016 allowed us to write code like this:

That code would return this output:

STRING_SPLIT is a table-valued function. It took two parameters:

  • The string to be split (Unicode or not)
  • The separator

It returned a set of rows with a single column called value, and it was fast.

So what were the problems?

  • The separator could only be a single character
  • Often you need to trim the values being returned. (That can obviously be done outside this function using TRIM but it'd be nice if there was an option to do it in the function).
  • But the biggest issue of all was that because a table-valued function doesn't have an inbuilt order for returning rows, you didn't know which token was which, that had been extracted from the string.

Now a TVF returns a table, and tables don't have built-in orders either. Nor do views. But what was needed was another column that told you which rows was which, and you could then use that for ordering when needed.

Azure SQL Database

STRING_SPLIT in Azure SQL Database now has an optional parameter to enable an ordinal (i.e. a position).

That extra parameter is a bit or an int that says if an ordinal value should be added to the output.

So this code:

Now returns:

I love working with Azure SQL Database for many reasons, but one of those is that you get new language extensions before other parts of SQL Server. I've been using this feature since the day that I first heard it existed, and it's working great. Now the updated documentation is online as well.

It also means that I can obtain any specific token that I want. For example, get me the third token in the string:

And that returns:

This is a much-needed enhancement.

SQL: Why use Change Data Capture for Azure SQL Database?

I often need to capture the changes from one database into another. The most common reason is that I'm wanting to bring changes from a transactional system across into a data warehouse that's part of a BI setup.

So which technology is best to use for this?

That's not a trivial question to answer but here are some thoughts:

Replication?

Unfortunately, this one's not available for Azure SQL DB as yet. Azure SQL DB can be a subscriber in Transactional Replication. We often use it this way. If we have an on-premises SQL Server, one of our favourite ways to get data into the cloud is by using Transactional Replication. (If you need to get your head around Replication with SQL Server, just head to our course here).

There are many advantages to replication, including the lack of impact on the source system, however Azure SQL DB can't currently be a publisher, so it doesn't help here.

And other forms of replication aren't really useful here, or an available option. So if the source DB is an Azure SQL DB, we need to find something else.

Azure SQL Data Sync

Azure SQL Data Sync is an odd technology. It basically grew out of Merge Replication based ideas. It's not built on Merge Replication, but it's very similar in concept. It was in a preview state so long, and the team had so long since stopped posting information about it, that most of us never thought it would ever reach GA.

You create a setup similar to this:

The sync metadata lives in a DB in Azure, and a copy of the DB that you want to sync is created as an Azure SQL DB. The Azure Data Sync engine then synchronizes the data between the HUB and the other DBs. If any of the DBs are on-premises, then an on-premises agent does the work.

Azure Data Sync (like Merge Replication) is trigger-based. Triggers are used to capture the changes ready for synchronization.

I wasn't a fan or Merge, and I can't say I'm a great fan of Azure SQL Data Sync. While it's conceptually simple, you would not want to use it for anything except very low volume applications.

Change Tracking

Change Tracking is another technology that's come directly from SQL Server land. When it's enabled, a set of change tracking tables are created. As data is changed in the tables of interest, changes are recorded in the change tracking tables.

One positive aspect of Change Tracking is that it isn't based on triggers and it outperforms trigger-based solutions. There are two downsides:

  • The changes are written synchronously, and in the context of the transaction that writes the change to the tracked table. This can impact the performance of the changes to the tracked table i.e. usually two writes are happening for each one that would have happened.
  • You don't get to see all the changes, and not in the order that they happened. Change Tracking lets you know which rows have changed, based upon the table's primary key. You can also ask to have a summary of which columns were changed). This can be a challenge for dealing with referential integrity, and other issues.

Queues (and Service Broker)

Another interesting option is to write to a queue. With an on-premises SQL Server, we can use Service Broker. If you haven't seen Service Broker, it's a transacted queue that lives inside the database. (To learn about this, look here).

With SQL CLR code or with External Activation for Service Broker, we could write to other types of queue like RabbitMQ.

At the current time, Azure SQL Database doesn't currently support writing to external queues. However, I do expect to see this change, as so many people have voted to have this capability added.

Change Data Capture

Change Data Capture (CDC) is another technology direct from SQL Server land. CDC is based on reading changes from a database's transaction log.

When you use it with SQL Server, it shares the same transaction log reader that Transactional Replication (TR) does. If you enable either CDC or TR, a log reader is started. If you have both enabled, they use a single log reader.

A key upside of using a log reader is that it doesn't slow down the initial updates to the target table. The changes are read asynchronously, separately.

Until recently, though, you could not use CDC with Azure SQL Database. The log reader agent ran from within SQL Server Agent, and with Azure SQL Database, you didn't have a SQL Server Agent.

The product team have recently done the work to make CDC work with Azure SQL Database.  It is an interesting option for extracting changes from a database, so this is the first blog post in a series of posts about using CDC with Azure SQL Database. Links to other posts will be added here as they are available:

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

 

ADF: Use MSIs not SQL Logins whenever possible

Azure Data Factory (ADF) is great for moving data around. It often needs to connect to SQL databases. If you're creating linked services for this, please try to avoid using SQL logins. Don't use usernames and passwords for authentication if you can avoid them.
 
Azure SQL Database lets you can create a user from a data factory's managed service identity (MSI). MSIs are a special type of Service Principal (SP). They provide an identity where you don't need to manage the ID or any sort of password.  Azure does that for you.
 
Each ADF exposes an Azure-based identity on its property page. Here is the main menu for my BeanPerfection data factory:
Data factory properties page
You might have guessed you should go to the Managed identities menu option, but that is for User Managed identities. I'll write more about them another day. The identity we're interested in is a System Managed Identity.
On this properties page, it's actually called the Managed Identity Application ID. Like all other types of Service Principal, Managed Service Identities have an ApplicationID.

Creating a user in Azure SQL Database

I create a user in an Azure SQL Database with the FROM EXTERNAL PROVIDER option. This says that Azure SQL isn't performing the authentication.

The user name is the full name of the ADF. In Azure SQL Database, we can just use the name of the ADF instead of the identity. It will do that for us.

I always make this data factory name lower-case. Once the user is created, I add it to whatever role in the database makes sense. In my example above, I used the role name datafactory_access but there's nothing special about that name.

You need to decide which role to add it to based upon the tasks that the data factory needs to perform. While it's temptingly easy to just add it to db_owner, try to resist that. If in doubt, create a role that's used for data factory access and grant it permissions as they are needed.

Must execute using an AAD-Based Connection

If you just try to execute the statements above, you might find that you get an error message saying:

Msg 33159, Level 16, State 1, Line 8
Principal 'beanperfectiondatafactory' could not be created. Only connections established with Active Directory accounts can create other Active Directory users.

You cannot create any type of Azure AD-based user in an Azure SQL Database, if your connection was authenticated as a SQL login. You must use a connection that was itself made using Azure-AD authentication.

I find that the easiest way to do that, is to make sure I have an Azure Activity Directory Admin assigned for my Azure SQL Server, and then just execute the code right in the Azure Portal. I use the Query Editor tab in the main menu for the Azure SQL Database and connect as that administrator.

 

SQL: Fix: The parameters supplied for the procedure "sp_set_firewall_rule" are not valid.

I often use the procedure sp_set_firewall_rule to set firewall rules for Azure SQL Server. (There's a similar call to set the firewall for databases). The other day though, I got an error that had me puzzled:

I also tried it with named parameters and got the same error.

When I looked at my previous scripts, I realised that I had used a Unicode string for the first parameter previously.

Solution

I changed ‘TestRule’ to N’TestRule’ and it worked fine.

I’ve not seen a procedure before that wants a Unicode string that won’t accept an ASCII string. For example, this works just fine:

Apparently strict data type checking has always been a feature of extended stored procedures and this one checks specifically for a Unicode string. I really don't use extended stored procedures much anyway.

What threw me as well is that I couldn't find it in the list of system stored procedures. It's not there because it's actually an extended stored procedure. These used to mostly have xp prefixes, not sp prefixes, and are a good example of why I don't love using prefixes like that.

I really wish though, that this procedure had a better error message. While the current one is strictly correct, it is not actually all that helpful.

 

New Online Course Released: Advanced T-SQL for Developers and DBAs

I'm really pleased to let you know that our latest online on-demand course is now released:
 
 
To celebrate the release, you can get 25% off the pricing until Aug 14th by using coupon code ATSRELEASE
 
We've had so many requests from customers to bring this course to our online platform. It was always one of our most popular in-person courses, and it's now released and fully updated.
 
The course includes the instruction plus quizzes and the same hands-on labs that we use in the in-person courses.
We've made a big effort with this course to make it really easy for you to do the labs. The labs only require you to have a fairly recent version of  SQL Server Management Studio (SSMS) installed to complete them. You don't need to install anything else. We've provided the required databases online ready for you to connect to.
 
Not on the latest version of SQL Server? Not a problem either. Unlike most other courses, our courses always cover at least all the supported versions of SQL Server and show you what's changed between versions.
 
I hope you enjoy it.

Reliably dropping a SQL Server database if it exists

I often need to write scripts that drop and recreate databases. The hard part of that has always been reliably dropping a database if it already exists. And no, you wouldn't think that would be hard, but it is.

Built in Command

T-SQL has a built-in command for this.

You'd hope that would work, but it doesn't.  I wish it did. The problem is that it will fail if anyone is connected to the DB. And to check if anyone is attached, you first need to check if the DB exists, so it makes the whole "IF EXISTS" part that was added to this command, completely pointless.

Worse, if you have separate code to kick everyone off first, you always have a chance of a race condition, between when you kick everyone off, and when you execute the command.

Nearly OK

Years back, the Microsoft docs library said to drop a database like this:

This was promising, but unfortunately, it has an issue as well. Because you were in the master database when you issued the ALTER, you don't know that you are the single user. So, periodically, that would fail too.

Best Workaround

Over the last few days, we've had a discussion on an MVP list about how to work around this. Many thanks to Paul White, Erland Sommarskog, and Simon Sabin for contributing to it.

The best outcome I have right now is to use this:

To get the DROP to work properly, you need to execute the ALTER DATABASE from within the target database. That way, you end up being the single user, and even though you then execute a change to master, you hold the required session lock on the DB, and then the drop works as expected.

Because you can't have a USE Sales in the script if the Sales DB doesn't exist, this unfortunately has to be done in dynamic SQL code, where it is only executed if the DB does exit.

The last change to tempdb is just protection, if I have a script that then wants to create the DB and change to using it. If that goes wrong, I want to end up creating things in tempdb, not somewhere else like master.

What I wanted

What I've been asking for, and for a very long time, is this:

The ROLLBACK IMMEDIATE needs to be on the DROP DATABASE command, not on a separate ALTER command. Hopefully one day we'll get this.

Book: Implementing Power BI in the Enterprise

It's been a while coming, but my latest book is now out. Implementing Power BI in the Enterprise is now available in both paperback and eBook. The eBook versions are available in all Amazon stores, and also through most book distributors through Ingram Spark distribution.

I've had a few people ask about DRM-free ePub and PDF versions. While the Kindle version on Amazon is their normal DRM setup, you can purchase the DRM free version directly from us here:

https://sqldownunder.thrivecart.com/implementing-power-bi-ent-ebook/

It contains both the ePub and PDF versions.

Book Details

Power BI is an amazing tool. It's so easy to get started with and to develop a proof of concept. Enterprises want more than that. They need to create analytics using professional techniques.

There are many ways that you can do this but in this book, I've described how I implement these projects.  And it's gone well for many years over many projects.

If you want a book on building better visualizations in Power BI, this is not the book for you.

Instead, this book will teach you about architecture, identity and security, building a supporting data warehouse, using DevOps and project management tools, learning to use Azure Data Factory and source control with your projects.

It also describes how I implements projects for clients with differing levels of cloud tolerance, from the cloud natives, to cloud friendlies, to cloud conservatives, and to those clients who are not cloud friendly at all.

I also had a few people ask about the table of contents. The chapters are here:

  • Power BI Cloud Implementation Models
  • Other Tools That I Often Use
  • Working with Identity
  • Do you need a Data Warehouse?
  • Implementing the Data Model Schema
  • Implementing the Analytics Schema
  • Using DevOps for Project Management and Deployment
  • Staging, Loading and Transforming Data
  • Implementing ELT and Processing
  • Implementing the Tabular Model
  • Using Advanced Tabular Model Techniques
  • Connecting Power BI and Creating Reports

I hope you enjoy it.

MVP Challenge: Data and AI plus some online exams

If you follow anyone that's part of Microsoft's MVP program, you might have heard there has been a global cloud skills challenge happening lately: #TheMVPChallenge.

There were three challenges that each of us could complete:

  • Azure Data & AI Challenge
  • Dynamics 365/Power Platform Challenge
  • Microsoft 365 Challenge

You can imagine which one I chose to complete. I did the Azure Data and AI challenge. Data and AI are a pretty common grouping.

While I would have liked to also do both the other challenges, Power Platform is obviously interesting to me, but I've looked at Dynamics over the years, and it's just not for me. The Microsoft 365 aspects aren't also my territory, but it might have been interesting to see what I could have learned if I'd done it, given most of us use those products every single day.

Data and AI Challenge

The name of this challenge was odd. There really weren't any data topics. It was all AI, and I don't mind that.

Some years back, I did the full Microsoft Professional Program for Data Science, and I also did the full Microsoft Professional Program for AI. Those offerings are now gone, and I have to say that I was quite sad when they disappeared. They really covered the topics in good detail. I also work with many of the Cognitive Services regularly, so this wasn't a new area for me.

I was really interested to see what Microsoft Learn now offered as a mechanism for free learning on these topics.

In the challenge, we needed to complete 42 modules. The depth really isn't there any more but for introductory-level material, what is provided is quite excellent. And did I mention that it's free?

Did I learn any new things while doing the challenge? Yes, a few. I think that any time you go back over an area, you pick up something that you've missed before. Or perhaps there's something you've forgotten because it didn't seem useful to you at the time, and now you realize that it is quite useful.

I'm already using some of the concepts that I picked up while doing the challenge, even though it was introductory level content.

MVP Community

I was really pleased to see how many of the local (and remote) MVP community took part. It's easy to start these types of challenges and to lose focus and stop.

In particular, I loved the way that our CPM Shiva Ford and other MVPs egged each other one to make sure we completed the challenge.

Exams

There was no requirement to do any exams related to this. I had a real interest, though, in knowing what was in all the "Fundamentals" exams. I wouldn't normally have taken them as they don't count towards certifications, so I decided to do a few exams during the month.

First I took the Azure Fundamentals exam AZ-900.

I thought the exam was OK, not too difficult and should be attainable for most people starting out with Azure. One thing I didn't like was that they spent so much time examining whether or not particular services were Platform as a Service (PaaS) or Infrastructure as a Service (IaaS) offerings. For many services, that's straightforward, but for services like storage, there are aspects of both. Regardless, that's not the sort of thing people should be all that focussed on. They should understand the difference, and that's enough.

Next I took the Azure Data Fundamentals exam DP-900. This had a little more substance to it. I thought it was a reasonable exam and covered many areas of data. The balance was a bit different to what I would have hoped for, but still OK. I did find questions, though, that were just simply wrong, where the author clearly might have read about a concept, but really didn't understand it. In hindsight, I should have taken the time to comment on the really problematic questions but I had a bunch of exams that day, and was already tired from fitting in the exams.

Next I took the Azure Administrator exam AZ-104. Now this exam was way more challenging than I expected. I wouldn't say that any of the concepts were all that difficult if you've worked with Azure for any length of time, but the way the questions were phrased made them more challenging than necessary. I was also surprised by the amount of focus on networking, when you consider all the tasks that an Azure administrator needs to handle.

I was glad to have taken AZ-104 though, as I had previously added the Designing and Implementing Microsoft DevOps Solutions exam AZ-400. I'd been meaning to take AZ-104 for ages, as that's what I still needed for the Microsoft Certified: DevOps Engineer Expert certification.

Upcoming

Now that I'm back into taking some exams again, I plan to do the other fundamentals exams (AI, and Power Platform) soon to see what's in those. Then I'll do the other data-related exams to complete those certifications.

Online Exams

I'll finish this post with a few comments about the online exam mechanism.

I didn't love it but it's workable.

I found the process pretty random. With my first exam, I tried to follow their instructions perfectly. When it got to my turn, a chat window opened but the text box where I could type wasn't enabled, so I couldn't respond to the monitor person. When I didn't respond, they told me they had to put me back in the queue again. Then that happened yet again. I was starting to think I wasn't going to be able to do the exam.

On the third time round the queue, I was able to type into the text box. They person then told me they couldn't see my video. But my video was displayed on the screen in their application. So clearly the application could see me. No idea why the monitor person couldn't. I started to explain that in the chat text box and then they just suddenly started the exam.

The other bizarre part is that they tell you to put your phone out of reach, and that if you leave the video area, you'll fail. So I put my phone in another room. And then a later screen tells you that if they need to contact you, they'll call you on your phone.

For the second exam, again I went round the queue two times. When the person came online in the chat, he told me I wasn't allowed to wear headphones. The bizarre thing is that I thought that would be best, and I did the first exam with them on. I took them off and he was happy.

Third exam started OK, but then I got pinged for turning my head. I have a tendency when sitting and pondering a question, to turn my head, and perhaps even lean back and look up. As soon as I did that, I had the monitor person warn me that if I did that again, I'd fail.

I'm glad there is a way to do these exams online, but I'd really like to see the experience improved. At least with these notes, I hope it will help you if you haven't been doing any exams this way.

 

 

 

 

How to kill off the Camtasia 2021 Launcher Pop Up

Camtasia is one of my favourite products. I use it regularly. I've been so excited to start to get to use Camtasia 2021 that was released just recently. It's a nice step up from an already great product.

But what I didn't like after upgrading, is that every time I started Camtasia, instead of the "normal" editing screen, I got a cutesy little popup that asked me what I wanted to do with the product today.

I'm not a fan of the popup; I'd rather the product just opened into a blank new project like it used to. The popup really just slows me down.

So I asked the TechSmith people on Twitter and they came to my rescue!

Here's the registry key that you need to modify to make this go away:

HKEY_CURRENT_USER\SOFTWARE\TechSmith\Camtasia Studio\21.0\Camtasia Studio\21.0\ShowLauncherAtStartup

If that's a 1, then you get the launcher. If you change it to a zero, it's gone.

Hope that helps someone.

SQL Interview: #14: Set operations using EXCEPT

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

UNION and UNION ALL are commonly used to combine two sets of rows into a single set of rows.

EXCEPT is another set operator.

Can you explain what it does?

Answer:

EXCEPT is used to remove any rows in the first set of rows, if the same rows appear in the second set.

For example, in the code below:

The query returns all the Trading Names for customers unless a supplier also has that same name.

In other database engines (e.g. Oracle), this operator is called MINUS.