SQL: Does an indexed view always get updated for base table updates?

I had a forum question recently that asked if indexed views get updated when the base table gets updated, but the columns that are part of the view are unchanged.

The easiest way to check this is to try it. Let's start with this code to create a table and put some data in it:

Then we'll create a schema bound view on the table, and a clustered index on the view:

Finally, we'll try both types of updates:

The first updates only a column that's not part of the indexed view. The second updates a column that is part of it.

And, as expected, the SQL Server team was mighty clever, and it works exactly as you'd expect:

The first update only touches the base table. The second touches both.

 

SDU Tools: Script User Defined Server Roles in SQL Server

As part of our free SDU Tools for developers and DBAs, we have many scripting functions. SQL Server allows you to create your own server roles, and generally, you should do this instead of using the fixed server roles, as it lets you allocate just the required permissions. To allow scripting these out, we've added a tool called ScriptUserDefinedServerRoles.

It's a function and doesn't require any parameters.

You can use our tools as a set or as a great example of how to write functions like these.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

Opinion: When did "sqls" become a thing?

Given that SQL is an acronym for Structured Query Language, a reference to "a SQL" is then a reference to a language. Or the term SQL just refers to the language.

So something that I find really odd now is the number of people using the word SQL as a synonym for a SQL statement. I keep hearing references like this:

We need to execute several SQLs against the server.

I need to run a few SQLs.

and so on.

I live in Australia where we're well known for mangling parts of the English language and, importantly, abbreviating almost everything. "McDonald" becomes "Macca", etc.

We don't do that with other languages. We don't write "I wrote four Spanishs" instead of saying "I wrote four Spanish sentences".

But I can't be the only one who finds the reference to SQLs as pretty jarring.

Would love to hear your thoughts.

T-SQL 101: #47 Determining the length of strings in SQL Server

You might need to work out how long a string is. There are two ways you could measure the length of a string. The first is the number of characters; the second is the number of bytes (i.e. the amount of data).

You can see both here:

Even though I've provided a Chinese string, LEN still returns 2 as the number of characters in the string.

But because they were each two-byte characters, the function DATALENGTH returns 4 as the total number of bytes.

One of the challenges with the LEN function is that it ignore trailing spaces. So the string 'hello' and the string 'hello    ' are both length 5 according to the LEN function.

If you would prefer a function that returns the actual number of characters, including trailing spaces, I'd suggest you look at the StringLength function in our free SDU Tools for developers and DBAs.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Snowflake for SQL Server Users – Part 18 – Time travel

One really important concept in data warehouses is the idea of versioning. Imagine that I have customers broken into a set of business categories. And then I change which customers are in which categories. For so many transactional systems, if I run a sales report for last year, the customers would now appear in the current business categories, rather than the ones they were part of last year.

In Kimbal-based designs, that's part of the rationale for what are called Slowly-Changing Dimensions (SCDs).

SQL Server Temporal Tables

SQL Server 2016 introduced temporal tables. Each temporal table has two components: one table that holds the current data, and another table that holds the history for the same data i.e. the previous versions of the rows. While the SQL Server implementation is interesting, in practice, I've found it lacking in many ways.

The biggest challenge is the lack of temporal joins. Even though I can view the contents of a table at a particular point in time, and I can view all the versions of rows from the table over a period of time, there are real challenges when you need to work across joins. There's no way to get rows from one table, based upon the timeframe of the rows from another table.  When I created the WideWorldImporters sample databases for SQL Server 2016, you'll find a whole bunch of cursor-based code in the procedures that extract data changes. I really didn't want to do that, but couldn't see any other way to achieve it.

Snowflake and Time Travel

The standard mechanism for accessing historical data in Snowflake is to use what's called Time Travel. The core things that Time Travel lets you do are:

  • Run queries to see the previous version of data at a given time
  • Create a clone of a table based upon a previous version of its data (you can also do this for schemas and databases)
  • Recover previous versions of objects that have been dropped

Because this historical data is available, there is even an UNDROP command that works for tables, schemas, and databases.

When you want to use historical versions of the data, in a SELECT statement (or in a CREATE CLONE statement), you can specify AT for a specific time, or BEFORE to exclude the given time. Then when you specify the time, you have three options:

  • Specify a specific time via the TIMESTAMP option
  • Specify a relative time from the current time by using the OFFSET option with a number of seconds
  • Specify a particular statement via a Query ID

The queries look like this:

Query ID

This last option is a really curious one for SQL Server people. Snowflake keeps track of previous queries that it has executed. Each one is identified by a Query ID.

There are many things you can access via this Query ID like the query plan involved.

Retention of History

The retention period for Time Travel is based upon the edition of Snowflake that you are using.

  • All editions are able to use a retention period of one day
  • Enterprise Edition allows up to 90 days for permanent objects, but only one day for transient and temporary objects.

Also in Enterprise Edition, you can specify a specific period for specific objects by using the DATA_RETENTION_TIME_IN_DAYS parameter when you are creating the objects.

If you really don't want to use Time Travel (and the space it will use), you can set DATA_RETENTION_TIME_IN_DAYS to 0 at the account level.

For an index to all posts in this series, see the first post here.

SDU Tools: Checking the version of SQL Server (by name) in T-SQL

We're pleased to see that there are now a large number of people using our free SDU Tools for developers and DBAs, and that number grows every day. One of the surprising requests that we've constantly received is for something that simply shows the name SQL Server version.

We've seen people doing quite messy things to derive the name, like parsing the return value from @@VERSION and so on.

So we decided to make that easy.

We added a function SQLServerVersion.

It takes no parameters and returns the name as a string.

You can use our tools as a set or as a great example of how to write functions like these.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

SQL: Finding the current default schema in a SQL Server session

Each SQL Server user has a default schema. This the used in several ways. If I execute code like this:

I haven't specified where that table will be created. People seem to assume it will end up in the dbo schema i.e. the table will be dbo.Blah but that's not the case. It depends upon the user's default schema. If that's Sales, then the table Sales.Blah would be created instead.

Similarly, if I execute code like this:

Again, I haven't said which schema the table Blah is in. So, SQL Server will first check my default schema, and if that's Sales, it will first look for a table called Sales.Blah.

The recommendation is to always use two-part names for all objects that live in a schema.

Default Schema

One of my MVP friends recently asked how you can find the name of the current default schema.

Now you can find a user's default schema by querying the sys.database_principals system view. It has a column default_schema_name that'll work.

However, what if the user connects via group membership?

How can you tell the name of the current default schema?

The answer is to do this:

The SCHEMA_NAME() function will return the name of a schema if you pass it the schema_id but if you don't pass anything, it will return the name of the current default schema.

 

 

T-SQL 101: #46 Using LEFT, RIGHT, and SUBSTRING to extract substrings in SQL Server

You might need to extract part of a string from a given string. T-SQL has a number of functions to help you do that. Take a look at these examples:

LEFT, RIGHT, and SUBSTRING are all functions that return parts of strings .

In the first example, we asked for 3 characters from the left of the string so SQL Server returned ABC.

The second example shows returning 2 characters from the right of the string, so we get KL.

The third example same to return 4 characters starting at position 5, so we get EFGH.

It's worth noting that unlike languages such as C#, SQL Server doesn't complain if you ask for more characters than are present in the string. It just returns as many as are available.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Learning Mandarin: Not only but also

An interesting pattern in Mandarin relates to how we translate the English "not only A but also B".

This is a good example of the fact that you can't just try to translate words directly between languages.

Here's a Mandarin example:

他不但很帅,而且很聪明 。 (Tā bùdàn hěn shuài, érqiě hěn cōngmíng.)

It's "He is not only handsome but also very smart".

不但 (Bùdàn) would literally translate to "not but", and 而且 (Érqiě) is pretty much like anything from "and" to "moreover" or "in addition".

Instead of 而且 (Érqiě), it's also common to use (Yě) to connect the two parts of the sentence. It normally means "also".  Here's an example:

这道菜不但好吃,也好看。(Zhè dào cài bùdàn hào chī, yě hǎokàn.)

That's "This dish not only tastes good, but also looks good".

The word (Hái) normally means "still" but can also mean "also" and can be used in place of (Yě).

Learning Mandarin

I'll write more soon on the best methods for learning Mandarin. If you want to get a taste for it in the meantime though, my current favorite site is iTalki, and my favorite teacher by far is Amy He. If you decide to try it, click here and it's cheaper for both you and me.

 

Snowflake for SQL Server Users – Part 17 – Data clustering

In SQL Server, most tables benefit from having a clustering key i.e. the column or columns that the table is logically sorted by. (Note: much old training material and documentation used to say "physical order" and that's never been true). To do that, SQL Server stores the data within the pages in a sorted way, and maintains a doubly-linked logical chain of pages. In addition, it maintains pointers to the data by creating an index on it.

By comparison, Snowflake's design tends to encourage you to avoid creating clustering keys on tables. For most tables, the data will automatically be well-distributed across the micropartitions.

You can assess how well that's working with seeing if there are multiple partitions with overlapping values. You can see an image of that in the documentation here:

https://docs.snowflake.net/manuals/user-guide/tables-clustering-micropartitions.html#clustering-depth-illustrated

The more overlap there is, the greater the clustering depth. You can see that by querying the SYSTEM$CLUSTERING_DEPTH system function. It's also possible to see all the most important clustering details (including the depth) by querying the SYSTEM$CLUSTERING_INFORMATION system function.

Sometimes you need to cluster

This is all good, but sometimes you need to cluster a table, particularly if you've had a lot of INSERT, UPDATE, DELETE style operations that have been executed. (Note: this is only an issue if it's changed the columns that were involved in the auto-clustering and has introduced skew and additional clustering depth).

The aim of the clustering (which, as noted, isn't for all tables) is co-locate the related table data the same micro-partitions, and to minimize clustering depth.

Generally, you'll only do this for very large tables. And sure signs that you need to do it are where your queries have slowed down markedly, and the clustering depth has increased.

Micropartition pruning is eliminating micropartitions that aren't needed for a particular query. It's somewhat like partition elimination and clustered columnstore segment elimination in SQL Server. And it's really important for query performance.

But there's a cost

While it might seem obvious to then keep all the tables clustered like we often do in SQL Server, the more frequently the data in the table changes, the more work is involved in maintaining the clustering.

However, if you have tables that don't change much and are queried all the time, clustering could produce a great outcome.

A few other notes

It's also worth noting that unlike applying a clustering key to a SQL Server index (via a primary key or a clustered index), the change isn't applied immediately. It's done in the background.

You also want to avoid having high cardinality columns as clustering keys. As an example, if you have TIMESTAMP columns (similar to SQL Server datetime or datetime2 columns), you'd be better off adding an expression (like a computed column) that truncated that to a date, and then clustering on that.

Reclustering a table in Snowflake is automatic and is performed when it becomes quite skewed.

 

For an index to all posts in this series, see the first post here.