SDU Tools: Invert String in T-SQL

I recently shipped the latest update to my SDU Tools collection.

This time I decided to add a tool that’s in the “cute” category. It’s called InvertString and you can see it action in the image above.

Yes, it takes an input string and returns a string that looks quite like an upside down version of the same string.

As part of the inversion I also reversed the order of the string so when you look at it upside down, it still looks correct. But if what you really want is a mirror image, you could combine it with REVERSE to achieve that outcome:

It’s based on my implementation of USD Encoding that you can find described here:

https://en.wikipedia.org/wiki/Transformation_of_text#Upside-down_text

It supports the English alphabet (sorry other cultures) and a variety of punctuation characters. It can take varchar or nvarchar input but returns nvarchar output so that we have a suitable set of characters to work with.

You can see the tool in action here:

You can find out more about our free SDU Tools here:

http://sqldownunder.com/sdu-tools

Enjoy !

SQL: Computed Columns: It’s a matter of persistence

Most SQL Server developers are aware that they can create computed columns. We do that by defining a column AS some expression like this:

image

Each time the value from that column is queried, the calculation is performed so the result can be returned. This makes sense when the value is changing regularly and the value is queried infrequently.

However, according to my completely subjective statistics, most computed columns are queried much more than they are ever changed. So why work the value out each and every time?

In some database products, the way to get around that is to define a regular columns, then create a trigger on the table that sets the value whenever the row is inserted or updated.

In SQL Server, we don’t need to do that. Back in SQL Server 2005, we got the option to created PERSISTED computed columns. You change the definition of the column like this:

image

When a computed column is defined as PERSISTED, SQL Server calculates and saves the value for you whenever the row is inserted or updated.

This means that you pay the calculation performance penalty only when the data changes, and your SELECT operations get to run at full speed, just like any other column.

Persistence is a good thing. It allows the value to be calculated now, not later when you need to query it.

Opinion: Don’t Play Hide and Seek with T-SQL

I spend most of my life in a variety of customer sites. Most of these nowadays tend to be large financial organizations. When I talk to developers in those organizations, and ask them about how they interact with their DBAs, I can’t tell you how often I hear that they try to avoid ever making schema changes, etc. as it takes too long and the process is painful.

When I talk to the DBAs at the same organizations, and I ask them why they resist making schema changes, I hear this:

If they change the database, they believe someone will scream at them, but they don’t know who because:

  • They have no visibility on the code that touches the database so
  • They do not know what they are about to break.

So where is this code that the DBAs can’t see or test?

  • Embedded in applications (or generated within the application dynamically) – often in widely-differing technology stacks (.NET, Java, etc.)
  • Embedded in Excel spreadsheets
  • Embedded in Reporting Services Reports
  • Embedded in Power BI Dashboards
  • Embedded in Analysis Services data sources
  • Embedded in SSIS Packages
  • Embedded in Access databases
  • And so on and so on.

What this means is that if the DBAs want to change the database, they have absolutely no way of knowing, at all, what they are about to break, unless they have tested every spreadsheet, have run every option in every report, have run every combination of SSIS packages, have reprocessed every analytic data model, have used every feature in every application, etc, etc.

That’s never going to happen, so what do they do? They push back on any changes.

image

If I talk to the developers, and ask them why they built this pile of cards, I’m invariably told that they had to be agile. Yet what they don’t understand is that what they have built is the complete opposite of agile. They have actually built a house of cards where no-one is game to touch anything. And we know where that all ends.

image

This is why I’m so much of a fan of having at least one layer of abstraction (yes that’s procs and views) within the database. Then at least, the DBAs have visibility in the code touching the database.

In a slightly better world, the data people would also have unit tests wrapped around those procs and views. Then, when they need to make a change, they could just re-run the tests, and at least know that they haven’t broken every report, spreadsheet, etc. in the organization.

I know that many might disagree, but I believe that if you are putting anything that looks like T-SQL logic directly into an application, into a spreadsheet, Reporting Services report, Analysis Services data source, Access database, etc., you’re building a problem faster than you’re building your applications.

SQL: NEWSEQUENTIALID – Nice try but missed the target

GUIDs (uniqueidentifiers in SQL Server) are well understood to cause fragmentation issues when used as clustering keys for SQL Server tables. It’s easy to understand why, but is NEWSEQUENTIALID the answer? I don’t think so. Here’s why.

When a SQL Server table has a clustered index, it needs to maintain the logical sequence of rows in the order of the clustering key. There are a few important notes here:

  • I’ve mentioned key, not column. The key might involve more than one column.
  • I’m not talking about primary keys. While the default for SQL Server is to make primary keys clustered if there isn’t already another clustering key, primary keys are a logical construct unrelated to how the table data is stored.
  • I’m not talking about physical storage. I often hear people say that clustered indexes determine how data is physically stored. That was in old Microsoft courseware, etc. and just isn’t true. It’s about logical ordering. If not, there would be no fragmentation and changing the data would take forever.
  • The problem with using GUIDs as clustering keys is that they are typically generated in a random order using a function like NEWID() or using a function like System.Guid.NewGuid() in .NET (or similar functions in other languages).

Let’s take a look at this by creating a table and populating it:

I’ve used NEWID() to populate the ImportantFactID column, and because we’ve declared that column as a primary key, and no other index is clustered, it will also be the clustering key. So the table will be clustered on ImportantFactID.

Now let’s look at the values that have been inserted:

Clearly the table data is being inserted in quite a random order. And this is what leads to the fragmentation.

So, it would seem sensible that we’d want to insert data in order. Mostly we do want that. There are some very high-end scenarios where the random inserts are better, but for most people, they are a problem.

Because this had become a common problem, SQL Server 2005 introduced NEWSEQUENTIALID(). It produces values that look like GUIDs but always increase in value.

Notice that you can directly create a value with NEWID():

But you can’t do that with NEWSEQUENTIALID():

NEWSEQUENTIALID() can basically only be used as a column default. Fair enough, but what do the values look like?

Let’s recreate the table and find out:

And let’s look at the data that was inserted:

The first thing that should strike you is that the values are now nothing like random. They are increasing but note that they are also predictable. That mightn’t concern you but I work at many sites where some reliance is placed on people not being able to guess GUID values. Clearly this isn’t the case with NEWSEQUENTIALID().

If you also wonder if you are now more likely to have a collision on generated values, the answer is “yes”.

But is the concept of them being ever increasing the answer to our problem?

I’d argue that it isn’t. You need to consider why people use GUID values in the first place. 99% of the time, it’s so that any layer of code can generate an ID without having to refer to a central ID controller. I can write an application where I generate an order ID, and attach order details to that order ID, and then throw the lot into the database without concern about colliding with something else, and without needing to make a round-trip to the database to first get a new ID.

As soon as I’m using NEWSEQUENTIALID(), at least as soon as I’m using the T-SQL function, I need to go to the database to get an ID as it’s only available as a column default. If I’m going to the database to get an ID, what advantage does a pseudo-GUID give me over just getting a bigint? You might as well just get a bigint.

For me, the T-SQL NEWSEQUENTIALID() function completely misses the point of why people use GUID values in the first place and it seems pointless to me.

Automatically increasing GUID values might have a place, but if they do, it’s only the ones that are generated in upper layers of code, not those generated in the database.

My verdict is that NEWSEQUENTIALID() is in the category of “nice try, seemed a good idea at the time, but completely missed the target”.

Blog Migrated – and major thanks to Adam Machanic

I recently posted that I was moving to this new blog and wanted to thank Adam Machanic and Peter DeBetta.

Well I need to thank Adam Machanic even more. I was intending to just start the new blog from scratch and leave the old one in place but Adam was working on code to help to migrate to WordPress-based engines from Community Server, so he offered to let me Alpha test it.

After a few adjustments along the way, it’s all basically done. I’ve now got all the old content, images, comments, etc. and replaced all the posts on the old site with links to the new one.

What can I say? Adam is my Super Hero this week.

Day 1 for the new blog but thanks to Adam and Peter !

For many years now, I’ve been hosting my blog at sqlblog.com. I can’t thank my old friends Adam Machanic and Peter DeBetta enough for starting SQL Blog and for providing many of us a home to post about all things SQL. But I know they’ve had challenges (and no doubt costs) from doing so, and the Community Server software is getting older and harder to look after now.

So, for me, it’s time for a new blog. I’m sure it will be quite a challenge to ever get back up to the number of views (and consolidated views that all on SQL Blog benefited from) but it’s time to start.

Welcome to day one of the new Bit Bucket !