SQL Down Under show 90 with guest Joe Sack discussing Azure SQL Database and Copilot is now published!

Joe Sack is an old friend. (I've known him a long time. He's not actually old). He's always fascinating to chat to and so I was so pleased to have him on another SQL Down Under podcast today. Last time was in 2017.

Joe is a Senior Product Manager at Microsoft and is working with the integration of data and AI with SQL Copilots. The aim is to use AI and natural language processing to make database management simpler, whether you've been doing it for years or just getting started. The experiences he discusses in the show include Azure Copilot integration and Natural language to SQL.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

SQL Down Under show 89 with guest Erin Stellato discussing SQL Server and data-related tools is now published!

Another bunch of fun today recording a SQL Down Under show and it's now published!

This time I had the pleasure of discussing SQL Server and other data-related tools with Erin Stellato.

Erin is a Senior Program Manager at Microsoft and works directly with the tools that I wanted to discuss.I've known Erin quite a while and she's always interesting to hear from.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

SQL Down Under show 88 with guest Angela Henry discussing data types in SQL Server is now published!

I really enjoyed recording today's SQL Down Under show and it's now published!

This time I had a great conversation with fellow Microsoft Data Platform MVP Angela Henry.

Angela is a principal consultant at Fortified, and on LinkedIn, describes herself as a data mover and shaper.

Angela has a particular fondness for the Microsoft BI Stack and Databricks.

You'll find Angela online as @sqlswimmer.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

SQL: Suggestion for SSMS -> Save as table

I often look at the results of a query in SSMS and want to save them off somewhere, and what I really want is a table. To do that, at present, I need to:

  • Right-click and use Save Results As to go to a CSV
  • Use the flat file import wizard (or something) to import the CSV

Now obviously, in some cases, if it was a SELECT query, I could add an INTO clause and just run the query again, but there are many many cases where I want to save the outcome of another type of query. It could also be that I just can't run the query again for whatever reason.

CSV Pain

Going via a CSV has a whole host of other problems. We all know that outputting into a CSV and loading it again can involve a world of pain.

A real benefit of going direct is the data-typing. SSMS already knows the data type of the results, so it would avoid the entire mess of having to change column data types, etc. when importing CSVs. It's painful.

Multi-Server Queries

Where I've really come across this lately is with multi-server queries. In those cases (apart from configuring a whole load of linked servers that I don't want), there really isn't another good option, apart from the CSV method.

What's needed?

I'd love to see an option in the Results grid in SSMS, in the Save To area, that let you save to a table.

Agree? If so, you know the drill, vote once, vote often:

https://feedback.azure.com/d365community/idea/18481740-8aec-ee11-a73d-000d3adc65a4

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 5 – Accessing CDC from another database

This is the final part of a series on using Change Data Capture with Azure SQL Database, and looks at accessing the data from change data capture from another database.

Scenario

I often use Azure SQL Database as a data warehouse. It's a perfectly fine database for that. But the challenge with Azure SQL Database is always about how to access data in other databases.

Many people don't seem to realize that you can use External Tables (and External Data Sources) to read data in one Azure SQL Database, in another Azure SQL Database.

In an earlier post, I talked about how you can access the list of logins in the master database from within an Azure SQL Database. I'd seen far too many people who were told that wasn't possible.

In that post, I showed how to set up a database scoped credential, an external data source, and an external table, so I won't repeat that detail here. Please refer to that post for specifics.

With Change Data Capture (CDC), what I'm often needing to do is this:

Two databases with arrow linking them

I have a source database (probably an OLTP database) and have enabled CDC on some tables. But I want to drag the data changes across to a staging area in the data warehouse database.

I wish the product team had made this easy but, at present, it isn't.

The Challenge

I can create an external table in the data warehouse database, that points to a table or a view in the source database. What I want to access though, is a table-valued function like the one I described in part 3:

cdc.fn_cdc_get_net_changes_dbo_NewEmployees

But it requires parameters, most importantly a @FromLSN and a @ToLSN. But I can't pass parameters to an external table. (I wish I could).

So how do I create a way to read from these functions, when I can't pass parameters, and I don't want all the data since the beginning of time?

There are two ways to tackle this.

Approach 1

In the past, here's what I've been doing instead:

Step 1: Local table

I create a local table, let's call it DataLoad.SourceConfigurations, and it has at least two columns for FromLSN and ToLSN. (Note these will be binary columns).

Step 2: External Table in the Source Database

I then create an external table in the source database, that points to that table that I just created in the data warehouse database. Let's call it DW.SourceConfigurations.

Step 3: View in the Source Database

I next create a view in the source database that reads from the DW.SourceConfigurations external table, to get the FromLSN and ToLSN values and uses them to call the CDC table-valued function. Let's call the view cdc.get_required_changes. I'll leave it in the cdc schema along with the standard functions. (Obviously I could have put it elsewhere).

The beauty of this, is that I can now control what's returned by this cdc.get_required_changes view, by updating the values in the DataLoad.SourceConfigurations table in the data warehouse database.

Step 4: External Table in the Data Warehouse Database

The final setup step is then to create an external table, let's call it DataLoad.RequiredChanges, in the data warehouse database, that points to the cdc.get_required_changes view in the source database.

End Result

Once we've done this, we have a workable solution. I can process changes from the source database in the data warehouse database almost as though they were local CDC changes.

I just put the LSN values that I want into DataLoad.SourceConfigurations, and then read rows from DataLoad.RequiredChanges.

Approach 2

A more recent, and simpler approach is to wrap the call to the CDC functions in a stored procedure.

We can now call a remote stored procedure by using

sp_execute_remote

This option appeared well after the other options were available. We'll still need a local table to store where we are up to, but we can just pass the values directly to the stored procedure, which can make the SELECT statement against the CDC function.

I hope this helps someone get started with this trickier aspect of using CDC in Azure SQL Database.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

SQL Server Locking, Blocking, and Deadlocks for Developers and DBAs Course Released

The latest online on-demand course that I've been working on is now out the door. SQL Server Locking, Blocking, and Deadlocks for Developers and DBAs is available at our site: SQL Server Locking, Blocking and Deadlocks for Developers and DBAs (sqldownunder.com)

It's designed to let you learn to find and fix locking issues, and to avoid them in the first place.

To celebrate the release, use coupon code LOCKINGRELEASE until January 20th to get 25% off the price.

Why this course?

I often get called to customer sites and they tell me that they have blocking issues. But what they call blocking issues isn't always accurate. They might mean locks that are held for a long time, or they might mean deadlocks. These are really common issues for customers.

  • Have you been working with SQL Server or Azure SQL Database and having issues with blocking and deadlocks?
  • Would you like to really understand how SQL Server manages transactions and locks?
  • Not sure what isolation levels are about, and which ones you should be using?
  • Don't know your app locks from your version stores?
  • Do you know the basics but think it's time to extend your knowledge?
  • You want to learn from an expert.

If any of these apply to you, this course is for you! And as well as detailed instruction, the course also offers optional practical exercises and quizzes to reinforce your learning.

SQL Server Locking, Blocking and Deadlocks for Developers and DBAs (sqldownunder.com)

SQL Down Under show 84 with guest Rob Sewell discussing SQL Server command line utilities is now published

I know it's been a while, but there's a new SQL Down Under show published!

While I've been busy with https://cosmosdownunder.com and https://fabricdownunder.com, I haven't forgotten the SQL Down Under shows. It's time for some more SQL Server love.

I had the great pleasure yesterday to record a podcast with one of my UK based friends Rob Sewell. Rob's one of the rare breed that is both a Data Platform MVP and an Azure MVP.

In the show, we discuss command line utilities for SQL Server, including the latest go-sqlcmd offering. We also include a shout out to the upcoming SQL Bits conference.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

SQL: Even more details on finding rows that have changed using HASHBYTES and FOR JSON PATH

In a previous post, I wrote about how to create a hash of all the columns in a table, by using FOR JSON PATH and HASHBYTES. This is incredibly useful if you need to check if incoming data is different to existing table data.

The code that I suggested (based on WideWorldImporters) was as follows:

The challenge with that code though, is that for the existing table data, it's best calculated when the data is stored, rather than every time it's queried. And, bonus points if you then create an index that holds just the key for matching plus an included column for the HashedData. With a bit of careful work, you can get an efficient join happening to find differences.

The Challenge

The challenge with this, is that you might have a column in the data that should NOT be checked. Perhaps you've stored a column called HashedData in the table and it should not be included in the hash calculations. Or perhaps there's an identity column that you really need to excluded.

The problem with the code above is that there wasn't an easy way to exclude the column.

Making a Wish

What I really wish that SQL Server had is a way to exclude a specific column or columns from a SELECT list.

When you are writing utility code, it would be incredibly useful to be able to say just:

I'm not a big Snowflake fan but it's an example of something that it does better than SQL Server in T-SQL.

If I had that, I could have just excluded it in the CROSS APPLY above.

JSON_MODIFY to the Rescue

But there is a way around this. Since SQL Server 2016, we've had JSON_MODIFY.

And I can use that to modify the data before hashing it. So to exclude a column called HashValue, I can write:

With JSON_MODIFY, we can choose a JSON property to remove by setting it to NULL.

This makes it far easier to write generic code, without the need to list all the columns (as you would with CONCAT or CONCAT_WS), and without the need for dynamic SQL.

SQL: Understanding Change Data Capture for Azure SQL Database – Part 4 – Azure SQL Database Service Level Objectives

This is part 4 of a series on working with change data capture (CDC) in Azure SQL Database. This part discusses how to enable it and how to use it.

When I was reading the documentation for CDC in Azure SQL Database, I kept coming across a mention that it required at least an S3 service level objective (SLO), if you were using a DTU-based database.

I really hoped that wasn't the case.

I was left saying "please say it ain't so!".

Testing

I thought it was time to find out what actually happens when you try it.

I started by creating a database with the lowest SLO (Basic). Now I know these don't hold enough data to really even be very meaningful, but I wanted to know what would happen.

I didn't get far.

I created the table from the scripts in part 3 of this series, and then tried to do the first enable of CDC.

To say that's disappointing is an understatement. But i tmakes it clear that Basic, S0, S1, and S2 aren't going to let you enable it.

What about Scaling?

Then I wondered what would happen if I enabled it on an S3, and then tried to scale down.

So I scaled the DB to S3 ok, and enabled CDC as in part 3. All worked as expected.

I then scaled the DB back to S2 using the Azure Portal. And alas, that failed too:

I suppose I'm not surprised, but I had hoped it might at least have automatically disabled CDC if it really had to.

It's a Mistake

I think these licensing restrictions for CDC in Azure SQL DB are a mistake.

Unhappy woman
Unsplash image from Getty

Back in SQL Server 2016 SP1, I was one of the main instigators to get the product team to make sure that the programming surface of the database worked across all the tiers (Enterprise, Standard, Developer, Express) wherever possible.

This restriction on CDC goes directly against that ethos. It harks back again to large single database thinking, not cloudy thinking.

I understand that it's probably needed for performance in production scenarios, but what I'm always trying to tell the team is that production scenarios aren't the only scenarios.

Developers also need to write code. We should let them write and test functionality. It can be clear to them that it won't work the same as on a production instance, but that's no different to many other aspects of the system. A Basic SLO database won't run a complex query over a lot of data either, but we don't expect it to.

But it's important to let them write their code and make sure it basically functions, no matter what level of SLO they're working with. If you require them all to have an expensive database, just to test code or concepts, you greatly limit how many developers will use the features. 

Clouds
Unsplash image by Dallas Reedy

Having those types of restrictions also restricts the cloudiness of the solution, in that you couldn't scale up/down across a range of SLOs. We often take S7 or S8 databases and temporarily "park" them at S2 while not in use. This type of restriction kills those options as well, and makes the overall solution less cloudy.

And One More Part to This Series

That's service level objectives for the use of CDC in Azure SQL DB. We'll move to accessing CDC data from another Azure SQL DB next.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

SQL Question: ROUND() didn't go to the same school I did?

I had an email from a friend today where he asked what was wrong with SQL Server, and wondered if ROUND() didn't go to the same school that he did.

The example was:

The output was:

What was puzzling him is why SQL Server's ROUND was converting 94.55 to 94.5 and not 94.6. Now writing numbers as strings is problematic to start with, but that aside, he was puzzled by the output.

Rounding

There isn't just one "correct" way to do rounding. Here is info on it: https://en.wikipedia.org/wiki/Rounding

The T-SQL ROUND() function does apply "standard" mathematical rounding. Values from 5 and up in the next digit go to the next value, less than 5 goes down. So you'd expect 94.55 to be 94.6 not 94.5.

It's worth noting that not all languages do this. Some languages implement Bankers' Rounding. This was designed to avoid the skew that you can get if you have a whole lot of .5 values. For example, 2.5 + 3.5 + 4.5 would be 3 + 4 + 5 = 12 if you rounded them all to integers first. With Bankers' rounding, it alternates. 0.5 goes to 0, 1.5 goes to 2, 2.5 also goes to 2, 3.5 goes to 4, etc. And so the 2.5 + 3.5 + 4.5 would be 2 + 4 + 4 which is 10, and closer to the underlying total of 10.5.

Older versions of VB, etc. used Banker's Rounding, but that's not what the T-SQL ROUND() function does.

So what's up with ROUND() ?

In that case, what's the problem with his original query?

The issue is data types. He didn't provide decimal values to round, he provided strings. That means they have to go to another data type before they are rounded. Which one?

If I execute this code, you might see the problem:

as it returns this:

Note that the value is a float. And as I've talked about in this blog many times, floats are evil for business applications, and a common mistake.

To see the difference, instead of using the implicit conversion, try this:

And that returns this output:

Note that the rounded value is the expected value, and the output data type is the expected decimal type.

Finally, also keep in mind what when SQL Server Management Studio shows you a float value, it also rounds it before it shows it to you. So a value like 9.9999999999999999 might appear as 10.0 but it's not 10.0.

Hope this helps someone.