SDU Tools for March 2018 – going out tonight

I'm really happy with the additions we made to our free SDU Tools for developers and DBAs that are going out tonight:

DatesBetween – table-valued function to return dates between two dates
DateDimensionColumns – table-valued function to return typical date dimension columns for a date (useful for data warehouses)

JulianDayNumberToDate – converts a Julian day number to a date
DateToJulianDayNumber – converts a date to a Julian day number

SystemConfigurationOptionDefaults – view that shows system configuration options and their default values (and many other values)
NonDefaultSystemConfigurationOptions – view that shows system configuration options that are not at their default values

SQLServerVersionForCompatibilityLevel – function that converts a database (or server) compatibility level to a SQL Server version

SystemWords – view that shows reserved words, future reserved words, ODBC reserved words, and system data type names and their display color in SSMS
ReservedWords – view that shows SQL Server reserved words (and their display color in SSMS)
FutureReservedWords – view that shows SQL Server future reserved words (and their display color in SSMS)
ODBCReservedWords – view that shows ODBC reserved words (and their display color in SSMS)
SystemDataTypeNames – view that shows all system data type names (and their display color in SSMS)

CreateLinkedServerToAzureSQLDatabase – as the name says – makes it easy to create a linked server to an Azure SQL DB

ListPrimaryKeyColumns – comma-delimited list of primary key columns for all primary keys

You'll find more on joining SDU Insiders and getting our free resources here:

http://sdutools.sqldownunder.com

 

SDU Tools: Separate T-SQL Strings By Case

If you've ever used SQL Server Reporting Services, you'll notice that when you drag a database column into a table, it auto-magically converts the name of the column into a more English-readable name.

Image from MSDN

Notice how when the LineTotal column has been dragged into the table, the heading has been set to Line Total with a space. What it is doing is taking a Pascal-cased or camel-cased name and separating the words with spaces, based upon where the capital letters are.

In our free SDU Tools, we have a tool that does that as well.

In the example above, you can see both a Pascal-cased phrase and a camel-cased phrase that has been separated.

You can see it in action here:

For more information on our SDU Tools, and to join our SDU Insiders to get them and our other free resources, please just look here:

http://sdutools.sqldownunder.com

 

Opinion: To find good staff, invest in communications, not buildings

Many of my customers are software houses (ISVs). In almost all of them, I hear people complaining that they can't find enough good staff. I think they are trying to tackle the wrong problem.

Most of the staff members they are trying to find are developers, and there are so many great developers out there, but you have to accept that they might not be in the location that you hope they're in.

I've seen companies struggling to hire the last remaining developers for various technologies, from those available in that city. This often even leads to crazy situations like hiring staff that they have previously rejected. They should not be drilling further and further down the talent pool in their location.

Worse, the more that companies hire poorly qualified staff, the more their experienced staff will be spending their times fixing issues caused by staff that should never have been hired in the first place. This is a formula for losing your best staff.

Some temporary respite might come from convincing people to move to the same city, but that's a very limited group of people who can do that or will be willing to do that.

So many jobs are now able to be performed from distributed locations, and development jobs are prime examples of these.

Why hire poorly qualified locals at ever-increasing costs when there are many outstanding people in other locations?

Instead of trying to fight for the last remaining staff in a city, and paying ever increasing salaries to ever less capable staff, companies should make a serious commitment to enabling remote work, and investing in really high quality communications infrastructure and probably some software. It's important to get out of the mindset of needing to have all your staff in your building.

Image by Richard Jaimes

It's time to look outside. It's a big world out there, full of amazing people who could be doing great work for you.

SQL: Real triggers, SET NOCOUNT ON, and Counting Correctly

I've done a lot of Microsoft exams over the years, mostly SQL Server ones but plenty of others too. And one thing that I really don't like is when the questions are:

  • Purely academic (ie: would never happen)
  • Memory based (ie: who cares what the DTU limit is for a P3 today?)
  • Clearly not written by someone who actually uses the product

Today, I want to mention an item in the last category.

In many exams, I've seen T-SQL code for triggers and one of the instant tell-tale signs of whether the author of the question really uses triggers or has just read the documentation and thinks they know about triggers, is whether or not they've used SET NOCOUNT ON.

If an application updates two rows in a table, SQL Server tells it that two rows were updated.

If you then add a trigger that, for example, updates a row in another table, SQL Server will now tell the application that three rows were updated.

This is enough to break many applications.

The application that sent the update won't want to know about other rows inserted, updated, deleted, etc. by your trigger. That will only confuse or break things.

While there are odd exceptions where you do want the default behavior, the answer to this is to use SET NOCOUNT ON. When we are teaching students to write triggers, we tell them that nearly every DML trigger should start with SET NOCOUNT ON. Check out the documentation here:

https://docs.microsoft.com/en-us/sql/t-sql/statements/set-nocount-transact-sql

SET NOCOUNT ON stops the count of affected rows being returned for an operation. The beauty of using this in a trigger is that it stops the count of rows affected by the trigger from affecting the count that the application sees.

So, when I see a whole lot of exam questions on triggers and none of them include this, that's really disappointing in so many ways. Learn to use this in your trigger code.

(Note that we try to avoid using triggers in the first place most of the time).

 

DevOps: Dealing with issues in migration-based deployment of databases

In an earlier post, I described the difference between state-based deployments and migration-based deployments. What I want to talk about today are two of the main issues that can arise in a migration-based deployment and what to do about one of them.

When you are using a migration-based deployment technique, you are storing all the scripts that take a database from its current state to the desired final state. While tools like Ready Roll make this is a relatively easy way to perform migrations, it suffers from a few key problems:

Performance

When you are applying a series of steps to make all the required changes to a database, you may end up taking much longer than a move directly from the current state to the final state would involve. For example, you might add a new index, then later decide to add another column to it. Clearly it would be faster to just create the final index, rather than to create the index then modify it.

I often see deployments where indexes are applied, removed, reapplied, tables added, then removed again, etc. This isn't a big problem for small databases but can be a major issue for larger databases.

While this is a real issue, it's one that doesn't have a simple answer when you are using migration-based deployments. It's not one that I'm going to discuss further today. Let's talk about another issue that you can do something about.

Drift

If you have multiple production servers, then servers in other environments (like UAT, Development, Test, etc.), in a perfect world, the schemas of the databases in each of these environments would be identical. Often though, that's not what happens.

If an urgent production fix is required, someone might make a change directly on a production server. Worse, they might apply the same change differently on each of the production servers. Then they might forget to apply the same change (or apply it differently) in other environments. No matter what the cause, the schemas have drifted.

Now if you create migration scripts in the development environment, and even test them in a test environment, those same scripts might break when applied to the production servers because the scripts are dependent on the schemas being the same.

To work around this issue, your deployment process needs to start with schema comparisons, using either Visual Studio's schema compare, or a tool like Red-Gate's SQL Compare.

You might not be able to get tools like this into the production environment, but generally, you should be able to script the database(s) in the production environment (and other environments), load the scripts in a test environment, and run the comparisons there.

If they aren't in sync, you need to fix that first.

Once you know that you schemas are in sync, you can proceed with a migration-based deployment with more confidence.

 

Shortcut: Change the Status Bar in SQL Server Management Studio (SSMS)

The status bar at the bottom of a query window in SQL Server Management Studio contains a wealth of information in its default configuration.

The bottom left shows the connection state:

I'll talk about connection management more in a later post. The bottom right also shows quite a bit:

In this case, it's showing me that I'm connected to a server called SDUPROD; it's running v14 of SQL Server (ie: SQL Server 2017); I'm logged on as GREGP50\Greg; my spid (or session ID) is 68; and I'm connected to the master database.

If I had a query running, the time would be counting up, and if a query had completed, it would show me the number of rows.

While this is useful, it's also configurable. The options are in Tools > Options> Text Editor > Editor Tab and Status Bar:

You can choose whether the displayed time is the elapsed time or the end time. In an earlier post, I mentioned that you can also see the end time in the Properties pane.

You can choose to remove the display of the database, login, row count, and/or server name. You can set the color used for group and/or single server connections and choose where the status bar is located. The only extra option at present is that you can move it to the top.

Some of these become more important when using multi-server queries. I'll talk about them soon.

SDU Tools: Quote a T-SQL String

I'm not a great fan of dynamic SQL (where you create a T-SQL command in code, then execute the command) but there are times when you just need to do it.

When creating dynamic SQL, it can get very painful trying to work out which strings are quoted and how many quotes you need.

To make this a bit easier, in our free SDU Tools, we added a function for QuoteString.

You can see its action in the screenshot above.

I puts an opening and closing quote around a string, but also doubles up any quotes contained within the string.

You can see it in action here:

For more information on our free SDU Tools and to get a copy of these and our other free resources by becoming an SDU Insider, follow this link:

http://sdutools.sqldownunder.com

 

Opinion: Constant churn breaks community learning for software applications

A current trend that I can't say that I love is constant churn within software applications. I have no interest to go back to the days where we got a new version of SQL Server or Power BI, etc. every few years.

It's also not a case of who moved my cheese?

In fact, I thrive on change. However, I've now become really concerned about how anyone:

  • Learns to use a complex application
  • Remembers how to use a complex application when they don't use it daily

I first really struck this issue with Azure. If I was teaching a class that used Azure, I could check every single lab on Sunday night, then Monday morning, the students would find it had all changed. That's OK for an experienced person, but not OK for a learner.

I love the rate of change for Power BI. We're endlessly getting wonderful new things. But I have to say that every class that I teach on this is like a new experience. I've got another one this coming Tuesday. I used to look forward to them but now I have a major hesitation every time, as I wonder what parts of the labs will have broken.

This is now an ongoing challenge for all this type of software though. I helped create some labs for VSTS very recently, and when I look at the product now, it barely resembles the one that I built the labs on.

Is it better? Probably yes.

But even though it might have been a few months ago, it feels like just the other week, and yet, not only has the UI changed, entire concepts have been added or removed, and the order that things are done in has changed substantially.

I don't know the answer to this but the current rate of churn is a substantial issue.

I gather the plan with the DevOps guys is to put a set of labs on GitHub, and let people who are doing the labs point out the issues day by day as they strike them. Again, for experienced users that might work. But for newcomers, I really wonder if that's what they'll think.

Will they realize the app must have changed, and it's all different, or will they just think the product is too hard to use. Either way, they'll be very frustrated.

Image by JeShoots

And while initial learning the product is one thing, I'm worried about it longer-term. A product like VSTS lets you set up automation and you hope you won't need to change it constantly. But if every time you go to make a change, you're struggling to use it like you're a newbie again, that's a problem.

Finally, I'm really concerned about ongoing support.

The vast majority of support of software applications today happens from community resources like blogs, webcasts, etc.

Will they continue to be created at the same pace if the authors know they'll be irrelevant or wrong within a very short time? How will end-users learn to do things when none of the online examples they find still work?

I wish I knew the answer to this.

SQL: Design – Entity Attribute Value Tables (Part 2) – Pros and Cons

In an earlier post, I discussed the design of EAV (Entity Attribute Value) tables, and looked at why they get used. I'd like to spend a few moments now looking at the pros and cons of these designs.

Let's use the same table as the last time as an example:

Pros

The main positive that's typically described is that the schema is "flexible". By this, the developers usually mean "I don't have to change the database schema (or worse, have someone else change it) when my needs change".

I don't buy this argument. Databases do a great job of managing metadata. Why would your app do it any better?  I'll be there's another reason (perhaps political) as to why this argument is being put forward in most cases.

Another argument is that this type of design handles sparse data gracefully. When we designed the WideWorldImporters database for Microsoft, we had a few edible products. The vast majority of the products were not edible. So the question is about where a use-by-date for an edible item should be stored. Should every product have that column?

Sparse columns in SQL Server deal with that quite well, but in SQL Server 2016, we also wanted to showcase some of the new JSON-based functionality, so we ended up putting this info into  a JSON-based column. I would have been just as happy with a sparse column for this though. AT least that would have allowed us good control over the data type.

Cons

I see many problems with these tables.

The most obvious problem is that a large number of joins is required to get all the related data for an entity if you ever need to process this within the database. Many of these applications, though, just read and write the entire entity each time and aren't concerned about this. These applications though tend to be at the small end of the market and aren't too concerned about performance.

A second key issue is data typing. In the example above, what data type is the Value column? It's most likely some sort of string. If you're one of the cool kids, it might be a sql_variant instead. In this case, let's assume it's a string of some type. How can we then provide any sort of guarantees on what's stored in the column? The LoyaltyGrade is clearly an integer in this example, but there's nothing to stop us putting Gold in that field instead.

Ah, but "the app does that" I'm told. That might be true but apps have bugs, as do ETL processes that put data in those fields. That's also very small-system thinking as most organizations will have a large number of applications that want to access this same data, and they won't even be built on the same technology stack, so lots of luck forcing all the access through your app or services layer. Ever tried to connect something like Excel to a service?

What I often then eventually see is typed columns within the table:

  • StringValue
  • IntValue
  • DateValue
  • DecimalValue

and so on. Then we have to consider what it means if a value happens to be in more than one of these. (Ah yes, the app won't allow that).

Once again though, what you're doing is building a database within the database, just to avoid adding or dropping columns.

Yet another problem is the complete lack of referential integrity. You have no way (or at least no ugly way) to ensure that values lie in a particular domain. Is it OK for Brunette to be entered rather than Brown for the hair color? If not, how do you enforce that? (I know, I know, the app does that).

These are just a few of the issues, and there are many more.

Before I finish for today, I do want to mention yet another design that I saw recently. Instead of one table to rule them all, with an EAV-based-non-design, what this database contained was a table for each data type. Yes there was a string table, an int table, a date table, etc.

It should be obvious that such a design is even nastier.

 

DevOps: Declarative is where we want to be

If you have ever tried to write scripts to configure operating systems, you'd realize just how hard that is to get correct.

For example, if you need to ensure that your virtual machine has two network adapters, and they need to have a specific set of IP addresses, how do you do that?

Image by Markus Spiske

The traditional approach for PowerShell (and similar tools) was to try to write a step-by-step script to configure the network adapters the way you want. But where do you start? Do you write a script to check for any existing adapters and loop through them to try to remove them? Do you try to add the ones you want, and then remove the others?

You'll quickly realize that you get into very, very complex coding because you might not be able to be sure what your starting point is. Everything has to stay on the path that you prescribed.

And worse, what happens if you run this script more than once?

That's the problem with imperative code. You are telling the operating system the steps required for configuration.

We don't want to be doing this.

With a declarative approach, we tell the system how we want it to end up ie: the desired state, and let it worry about how to get configured that way. This is what we want to be doing instead of writing imperative code.

PowerShell offers DSC (desired state configuration) where you describe in a JSON file, the way you want the configuration to end up. A bonus in this approach is that it's idempotent ie: no matter how many times you run it, you end up with the same outcome.

It's important wherever possible to be doing declarative configuration not imperative configuration. In later posts, I'll talk more about how.