SQL Question: ROUND() didn't go to the same school I did?

I had an email from a friend today where he asked what was wrong with SQL Server, and wondered if ROUND() didn't go to the same school that he did.

The example was:

The output was:

What was puzzling him is why SQL Server's ROUND was converting 94.55 to 94.5 and not 94.6. Now writing numbers as strings is problematic to start with, but that aside, he was puzzled by the output.

Rounding

There isn't just one "correct" way to do rounding. Here is info on it: https://en.wikipedia.org/wiki/Rounding

The T-SQL ROUND() function does apply "standard" mathematical rounding. Values from 5 and up in the next digit go to the next value, less than 5 goes down. So you'd expect 94.55 to be 94.6 not 94.5.

It's worth noting that not all languages do this. Some languages implement Bankers' Rounding. This was designed to avoid the skew that you can get if you have a whole lot of .5 values. For example, 2.5 + 3.5 + 4.5 would be 3 + 4 + 5 = 12 if you rounded them all to integers first. With Bankers' rounding, it alternates. 0.5 goes to 0, 1.5 goes to 2, 2.5 also goes to 2, 3.5 goes to 4, etc. And so the 2.5 + 3.5 + 4.5 would be 2 + 4 + 4 which is 10, and closer to the underlying total of 10.5.

Older versions of VB, etc. used Banker's Rounding, but that's not what the T-SQL ROUND() function does.

So what's up with ROUND() ?

In that case, what's the problem with his original query?

The issue is data types. He didn't provide decimal values to round, he provided strings. That means they have to go to another data type before they are rounded. Which one?

If I execute this code, you might see the problem:

as it returns this:

Note that the value is a float. And as I've talked about in this blog many times, floats are evil for business applications, and a common mistake.

To see the difference, instead of using the implicit conversion, try this:

And that returns this output:

Note that the rounded value is the expected value, and the output data type is the expected decimal type.

Finally, also keep in mind what when SQL Server Management Studio shows you a float value, it also rounds it before it shows it to you. So a value like 9.9999999999999999 might appear as 10.0 but it's not 10.0.

Hope this helps someone.

Book Review: Leap First by Seth Godin

I'm a fan of Seth Godin, and have always enjoyed listening to him. Recently I finished listening to Leap First on Audible. 
 
When I started listing to this book, I didn't quite know what to make of it. I always enjoy Seth's anecdotes. At first, the book seemed more like a series of anecdotes than an in-depth treatment of the topic. It seemed to lack a continuous train of thought. I started feeling like I was listening to a collection of anecdotes from Seth, rather than a "real" book.
 
It's brief: only 2 hours 6 minutes long. I started to wonder if Seth had felt the need to push out another book, when one wasn't needed.

But Later

I was wrong.

As the book continued, a real train of thought did emerge to tie the thoughts together. I ended up quite enjoying it. The book is a recording of parts of a session that he was delivering. The audience was aspiring entrepreneurs and others. And Seth is a great public speaker.

 
I loved when he called us all out on procrastination, and our failure to "leap".  Other people's reviews were strong. "Highly recommended for anyone who might be stuck or trying to find meaningful work."  This is where the brevity of the book might well be a good thing. I can imagine people listening to it many times, particularly when they feel the need for a "pick me up" in their work.
 
I loved the stories of authors reading the one-star reviews of their work, and the analysis of why they do that. I've seen that many times and he's dead right.

Summary

A brief but interesting book. Don't let the feeling that you are listening to a collection of anecdotes put you off. An message does shine though as the book continues.

7 out of 10.

 

Book Review: The Incredible Unlikeliness of Being

Ever since I watched Professor Alice Roberts' series on travelling Egypt by Train, I've been quite a fan of her work. (It also means I probably came to knowing about her later than I should have). Since then, I've been working through a number of her books. On Audible, I just finished listening to The Incredible Unlikeliness of Being.

Amongst many other things, Alice is an English biological anthropologist. She also worked as a doctor in the National Heath Service in Wales for a while, but she left clinical medicine to focus on anatomy.

This is an amazingly powerful book.

I can't describe just how many things I learned when listening to it. Alice shows a mastery of so many disciplines (like anatomy, genetics, biology, evolution, and more) and combines them to tell a detailed story of the evolution of the human body. She fills in so many gaps that I didn't understand about why things in the body are the way they are, and importantly, how they came to be that way.

Alice works through each of the major systems of the body, one by one, and describes so much about how they function in humans, and how they function in other animals, particularly those that share a common ancestor with us in the tree of life.

When I was at high school, biology was considered a softer science than chemistry, psychics, mathematics, etc. In fact, in an indication of the appalling gender-divide back then, at our school biology was taught over at the girls' high school and males who wanted to learn, needed to go there. Worse still, because it didn't rate as well as other sciences, it put students in a less competitive situation for university entrance. That meant that if you wanted to be a medical doctor, the worst thing you could do was to study biology at school, as it would reduce your chances of getting into medical school. How ridiculous!

As I've aged, I've now built a complete fascination with biology. I wish I'd spent more time on it when younger. This book hit the mark for me nicely.

Audio

I've seen a few comments from people (one or two from the USA) who have some difficulty with Alice's accent and found the book harder because she read it herself. I couldn't disagree with those comments more. I'm so very glad that Alice read the book herself. She has a steady pace, a clear accent, and is obviously used to teaching. Her accent is a pretty straightforward English accent.

The Verdict?

After learning so many things from the book (and it's a long book), I got a hint of just how much more I don't know about this.  At some point, if I have another 11 hours 15 minutes to spare, this is one book that I'll likely listen to again, to try to catch some of what I'm sure I missed.

10 out of 10

 

 

SDU Tools: List use of Deprecated Data Types in a SQL Server Database -> Updated

I've previously posted about the procedure ListUseOfDeprecatedDataTypes in our free SDU Tools for developers and DBAs. I mentioned that I'm often reviewing existing databases and one of the first things I go looking for is the way they've used data types, and that in particular, I'm keen to know if they've used any deprecated data types (i.e. ones that will/might be removed at some point).

The procedure was updated in version 21 to now include a ChangeScript column. (Thanks to Michael Miller for the suggestion).

It still has the same parameters:

@DatabaseName sysname – This is the database to process
@SchemasToList nvarchar(max) – a comma-delimited list of schemas to include (ie: 'Sales,Purchasing') or the word 'ALL'
@TablesToList nvarchar(max)- a comma-delimited list of tables to include (ie: 'Customers,Orders') or the word 'ALL'
@ColumnsToList nvarchar(max) – a comma-delimited list of columns to include (ie: 'CustomerName,StreetAddress') or the word 'ALL'

And still returns one row for each use of each deprecated data type.

The columns returned are SchemaName, TableName, ColumnName, DataType, SuggestedReplacementType, and now also ChangeScript.

You can see the previous version in action here:

In our own internal version, we also flag money and smallmoney types as we don't want them used, but the tool doesn't flag those.

But let's look at an example of using the update, in this case against msdb. (I wish msdb wasn't still using types that were deprecated in 2005 and it's now 2022):

Deprecated types in msdb

Note the new ChangeScript column that is output.

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

SQL: Understanding Change Data Capture for Azure SQL Database – Part 3 – Enabling and Using CDC

This is part 3 of a series on working with change data capture (CDC) in Azure SQL Database. This part discusses how to enable it and how to use it.

To show how this works, I have created an Azure SQL Database called CDCTest. I created it as a DTU-based database with a service level objective (SLO) of S3. I'll discuss more about the licensing implications of CDC in part 4.

Connecting and Setup

I've connected to the database using SQL Server Management Studio (SSMS) and opened a query window to the new database. I've then executed the following to create the objects:

If I check the entry in sys.databases by executing this command:

The output of the column is as shown:

CDC not enabled

Enabling CDC at the DB Level

So let's start by enabling CDC for the database, by executing:

Checking sys.databases the same way again now shows it enabled:

CDC is enabled

That's how we can check if it's enabled at the DB level. But there are other changes that have occurred. First, there's a new schema that's been added:

New CDC schema visible

And it contains a number of objects:

CDC schema objects

Enabling CDC at the Table Level

Next, let's enable it at the table level and see what's changed.

We did several things here. We said that the source table is called NewEmployees and that it lives in the dbo schema. We then said that this capture instance will support net changes. That means that as well as providing each individual change, we can ask CDC for the net affect of a set of changes. We'll see that later. And finally, we told it that for now, only admins can access the data. Otherwise, we'd have had to provide a name for the role that contains the users who can query this.

We can then see that the table is enabled for CDC by querying sys.tables:

sys.tables

But wait a minute. We only created one table right? What are these others? We can find them if we expand the tables in the database:

List of tables - more than expected

Note all these extra ones are system tables.

One other thing we did, even though we didn't specify it, is to create a capture instance called dbo_NewEmployees. That's the default name but we could have named it by using a parameter. Each table can have two capture instances to allow for handling schema changes and/or other requirements.

We can see the list of capture instances for a table by using this query:

capture instance details

This shows the configuration of the capture instance, the start and end points of the log that it relates to, and more, like the name of the index in the table, etc.

Using CDC for the Table

Now let's make use of CDC for this table. I'll start by inserting three rows, then updating one of them.

Now let's check out what happened by calling the CDC functions:

When calling these functions, I needed to specify a range of log sequence numbers (LSNs). I called sys.fn_cdc_get_min_lsn to get the minimum value available to the capture instance (note: not to the table), and used sys.fn_cdc_get_max_len to get the maximum value that's available from the log right now.

Normally, I'd want to record where I'm up to and get values past that point.

The output of the first SELECT using cdc.fn_cdc_get_all_changes_dbo_NewEmployees was:

Get all changes

Note that it shows every operation i.e. the three inserts (with __$operation as 2) and an update (with __$operation as 4).  The primary key value is also shown, and the LSNs where it all occurred.

Also note that the name of the capture instance is part of the name of the function that was created when we enabled CDC for the table.

Often I do want every operation, but sometimes I don't care about all the individual operations. I just want the net effect. That's what's shown in the second SELECT using cdc.fn_cdc_get_net_changes_dbo_NewEmployees:

Net effect

This one is interesting. Inserting 3 rows, then updating 1 of those rows, is the same as inserting 3 rows but one of them with the updated value. And that's what you get from this function.

And that's the basic enabling and use of CDC in Azure SQL DB. We'll move to licensing implications in the next part.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

Announcing the Fabric Down Under podcast – Show 1 available now

I've mentioned in a previous post how important I think Microsoft Fabric is going to be.

So, in addition to our existing SQL Down Under, and Cosmos Down Under podcasts, we have a brand new podcast called Fabric Down Under.

In the preview for Microsoft Fabric, one of the key participants from the product group has been Pawel Potasinski. I've known Pawel for many years and he's been the driving force behind many important things. It was great to see him driving community around Microsoft Fabric in preview.

So he was the obvious choice for the first podcast in the series.

I hope you enjoy it. You'll find all the shows and details on subscribing at our site: https://fabricdownunder.com

Welcome Microsoft Fabric – Most significant change in Microsoft BI

Microsoft has had a strong lead in BI and analytics for a long time now. The introduction of the tabular data models in 2012 was a watershed moment.

Today, they have announced a private preview of Microsoft Fabric. It's a bigger deal than the tabular model was, and will set the direction of BI tooling for a long time to come.

The opposition was already struggling to keep up, and in fact, hadn't managed to do so. Microsoft Fabric will up the ante much, much higher than where we already were.

What is Fabric?

First up, it's a licensing model. Similar to the way you can license Office as a bundle rather than as individual products, it's a product that includes a set of best of breed tools.

But it's not just a licensing model.

Microsoft Fabric delivers a single integrated experience for working with analytics, with widespread integration between the tools, which are known as experiences. At release, these are the experiences:

Data Integration – will be very familiar to anyone who's worked with Azure Data Factory

Synapse Data Engineering – if you love running Spark or Python notebooks and more, this will make you feel happy

Synapse Data Warehousing – this involves both data warehouses, and lake houses, and provides an interface that's familiar to SQL devs.

Synapse Data Science – want to create and run machine learning models across the data? This is for you.

Synapse Real Time Analytics – need to deal with telemetry, logs, etc. and query them for analytics at volume? This is the tooling you need.

Power BI – is enhanced to work with the new storage mechanism. As well as the previous import and direct query modes, there is now a DirectLake mode.

And I'd expect to see more experiences coming.

One Lake

One of the most critical aspects of Microsoft Fabric is that data for all the experiences is stored in One Lake. This is basically an Azure Data Lake type of storage that is used to hold Delta-Parquet files, but again it's much more than just a storage account.

Having this single copy of the data in a single format, means you can access the same data from within all the different experiences. I can't stress how important this is.

And contrary to what happens with competitors, the file format is a standard open format using Delta-Parquet.

Get Involved !

Importantly, you can now try this too. You'll find free trials as part of the public preview at https://aka.ms/try-fabric. I'd encourage you to get involved.

For a detailed introduction, watch the on-demand session from Microsoft Build: https://build.microsoft.com/en-US/sessions/852ccf38-b07d-4ddc-a9fe-2e57bdaeb613?source=sessions

And make sure to subscribe to our new Fabric Down Under podcast!

Cosmos Down Under podcast 8 with guest Blaize Stewart is now published!

I recently noticed a blog post from Jay Gordon that was calling out interesting content from Blaize Stewart on antipatterns for development with Azure Cosmos DB.

I thought this would be a perfect topic for one of our Cosmos Down Under podcasts and Blaize agreed to take part.

So much content talks about what to do with a product, and even though everyone knows there are things you shouldn't do, very little content talks about these antipatterns. It was great to speak to Blaize about it, and to get his updated thoughts.

You'll find this (and all) shows here: https://podcast.cosmosdownunder.com

ADF: Passing a file attachment via a POST to a REST service using Azure Data Factory

I was recently doing work for a client where I needed to replace some Python code that was being used to upload a file attachment to a REST based service using a POST command. I was trying to do it using a Web Activity in Azure Data Factory. What was surprising was how little information I could find on how to do that.

While I can't post all the code, I did want to post the core parts here in case it helps others, and for my own future reference. The basic steps involve:

  • Put the file into a storage account
  • Read the file into a variable
  • Construct the body of the POST call, including the attachment
  • Send the POST call using a Web Activity
  • Test the REST-based call

Put the file into a storage account

Ideally, the Web Activity would have a way to just point to a file in an Azure Storage Account and get it to attach that file to calls. It doesn't have that option.

What it does have is an option to include a dataset in the body of the call.

Add dataset UI

I was immediately hopeful and tried that, and while it was interesting, it didn't include the data as an attachment. It just appended it into the body of the call.

So let's load up a file into a variable and use the variable to provide part of the body of the web call.

First, I've created a resource group called rest_post_test using the Azure portal, then created a storage account called greglowfileupload. I created a text file called cities.csv with the following contents:

List of cities

And I uploaded it to a container called uploadfiles.

Cities file in a storage container

Read the file into a variable

I created an Azure pipeline called RestUploadTest, and added the following parameters:

Pipeline parameters

The APIKey was provided by the REST service and is used to log on, instead of using other forms of authentication.

The SiteURL is the address that the POST is being sent to. I've used a test site that I'll talk about in a moment.

The ContainerName, FolderName, and FileName are the location of the file in the storage account. (It is not in a subfolder)

All of these values would normally be read from a database, but for simplicity, I've made them parameters to the pipeline.

I then added the following variables:

Pipeline variables

Next I added a linked service StorageFile to connect to the storage account.

Linked service definition

Note that I would normally use a Key Vault for holding key details for the storage account, but I've gone for simplicity in this demo.

To read the file from storage, we can use a Lookup Activity, but it requires a Dataset, so we create the dataset first. I called it StorageTextFile.  I like to make datasets reusable, so I added parameters to it:

Dataset parameters

Then I configured those parameters as dynamic values in the Connection tab:

Dataset connection tab

Dataset connection tab 2

Note that because I used a Delimited Text File as the dataset type, I had to configure a column and row delimiter. I chose values that will not occur in my file. I used the preview option to make sure I could read the data.

In the pipeline, I added a Lookup Activity called Get File Contents, configured the timeout as 20 minutes (not the default of 12 hours !) and set the details as follows:

Get file contents activity

The dataset properties were populated dynamically from the pipeline parameters.

Finally, I added a Set Variable Activity called Set AttachmentFileContents.

Set attachment file contents

The dynamic code for setting the variable was:

I used the output of the Get file contents activity, and got the first row from it. Finally, because I didn't use the first row as a header row, it allocated its own headers. In this case, I have one column called Prop_0.

And then tested the code up to this point. I clicked on the output of the Set AttachmentFileContents Activity and can see the values. (Encoded in JSON in this viewer).

First test ok

All good to here. We have the file loaded into a variable.

Construct the body of the POST call, including the attachment

The next two things I need to configure are the variables that are holding the content type and body for the web request. I put them in variables separately to make it easier to debug.

First, I configured the web request content type variable:

Web request content type

The value was actually:

This is multipart form data, and because the body can contain many values, you need a boundary that's used to separate them.

Then the body values:

Web request body

The value was comprised of:

Note: the positioning of the carriage return/linefeeds in this is critical. You can't just concat them by using '\r' or '\n'. I've found that doesn't work. The line breaks need to be in the dynamic values.

Also note: you'll see the boundary value in here, but the first time I used it, I added a double-dash prefix, and at the end, I used a double-dash prefix and suffix.

Now we have the pieces ready for the Web Activity.

Note on Line Endings

I found this works for most web servers, but some web servers are fussy about how new lines are formed. Some are happy to just get a line feed (LF) character, others want a carriage return and line feed (CRLF).

If you only supply a LF and it wants a CRLF, the error returned is normally one that says you've reached the end of the stream unexpectedly (or similar words). In that case, you'll need to adjust the body to have consistent line endings with CRLF.

The replace() function in ADF works fine, but you can't just specify '\n' or '\r', etc. I've found you need to use the base64 encodings to get what's needed. (Suggestion from fellow MVP Simon Sabin)

They are as follows:

  • Instead of '\r' write base64ToString('DQ==')
  • Instead of '\n' write base64ToString('Cg==')
  • Instead of '\r\n' write base64ToString('DQo=')

Send the POST call using a Web Activity

I then added a Web Activity called Upload the file attachment:

Web activity

Note the Authorization header contained:

That supplies the API key as the bearer.

Note also: you shouldn't need the Accept header but I've found some services complain if it isn't there. The other reason for having it set to application/json is so that when an error occurs, you can read it without having to decode it as you would with gzip compression, etc.

Uploading Larger Files

If you are uploading larger files, there are two things to consider:

  • I'm not sure what the current maximum size is for an ADF variable, but the method that I've used assumes that the whole file will fit in a variable. It's pretty large though.
  • In the advanced settings for the Web Activity, there is a setting for an HTTP Request Timeout. That defaults to 1 minute. If the upload will take longer than that, you'll need to increase it. It has a 10 minute maximum value.

HTTP request timeout

Test the REST-based call

Now we get to test it. What we need is something to catch the requests while we're testing. The easiest site I've found for this is webhook.site. You can set up a site that listens on a URL for free. It's awesome. When you start it, it gives you a temporary URL to use:

That's the URL that I used earlier as a pipeline parameter. Let's test the POST by debugging the pipeline.

First, we make sure the pipeline ran as expected:

When we check the website, we see the request received, and with the file attachment!

Post received

Lower-level debugging

While this site is great when debugging most things, at first I struggled with it. The problem was that when I had malformed requests, I wished that it would just show me the raw data that it was receiving. I needed to compare my requests to other well-formed requests. I asked the authors and it can't do that. (At least not right now)

What did help, is that I found a GitHub project from Janne Mattila, called Echo. You'll find it here: https://github.com/JanneMattila/Echo. It looks like Janne also has a hosted version of it available here.

I decided to build that project and deploy it to an Azure Website, in case I wanted to modify it. And it was so very helpful. It just shows you the detail of everything it receives, rather than formatting the request:

Echo output

Thanks Janne !

I know this has been a longer blog post than normal but I hope it helps someone work this out.

 

MS Tech Summit 2023 – Discounts running out

We have so many great tech friends in Poland. I was pleased to see they are running the MS Tech Summit, and pleased to be speaking at it.

Looks like a great set of sessions : https://mstechsummit.pl/en/ They tell me there are more than 100 sessions spread over 8 tracks.

Not in Poland? Not a problem. It's running in-person and online, and with video on demand.

The early bird discount is still available but about to end.

Love to speak with you at the summit.