SDU Tools: Single space words in SQL Server T-SQL

We've got a lot of very useful string-related functions in SDU Tools. This is another one. At times, there's a need to remove excess spaces between words in strings. In our free SDU Tools for developers and DBAs, we added a function to do just that: SingleSpaceWords.

The function just takes one parameter: the string to be processed.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

T-SQL 101: #43 Concatenating strings in SQL Server

Joining together strings is called concatenating strings. In the first example here you can see SELECT hello, plus NULL, plus there. Now the trick with this, though, is that NULL is a bit of a challenge.

Note that the outcome of that first select statement is NULL is just NULL. As soon as you concatenate something with NULL, you would end up with NULL as the overall answer.

So, in SQL Server 2012 they gave us a CONCAT function. And that allows you to put a list of values (it doesn't matter how many) and concatenate them all. When I first saw this function, I wondered why on earth we needed it, as we could already concatenate strings. However, it does two special things.

The first is that it ignores any NULL values. So you can see in the second SELECT statement above, that it ignore the NULL and just concatenates the other two values.

The second excellent feature of the CONCAT function is that it doesn't care what the data type of your input values is, as it will implicitly convert all the values to strings before concatenating them.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Snowflake for SQL Server Users – Part 14 – Internal storage in Micropartitions

If you want to learn a lot about how Snowflake works internally, it's worth spending some time reading the ACM whitepaper that described the service.

It describes the overall structure of the service, and then describes how data is stored, followed by a description of how the encryption of the data within the service is accomplished. (I'll talk more about the management of encryption keys in a later blog post).

Columnstores

Columnstores will be familiar to SQL Server users ever since 2012, and it's no surprise that the storage within Snowflake is essentially columnstore based.

Columnstores are great for adding data into, and are excellent for reading large amounts of data in bulk. What they aren't great for, is being updated. Clustered columnstore indexes in SQL Server are updatable but only because they have an associated rowstore that's used for the delta store.

Micropartitions

In Snowflake, the columnstores are formed by a series of what are called micropartitions. Each of these is a contiguous storage location that holds up to 16MB of compressed data (50 to 500 MB of uncompressed data), but importantly, the micropartitions are immutable i.e. once they are created, they are never modified.

The metadata for the micropartitions records the range of values for each of the columns that are stored in the micropartition, the number of distinct values, and a few other properties.

Tables get partitioned automatically during insert and load operations, based upon the order of the incoming data.

Interestingly, the file extension for these is FDN which is short for flocon de neige i.e. the French word for Snowflake.

Query performance against the columnstores is highly dependent upon being able to determine which micropartitions need to be read to satisfy a particular query. In Snowflake, the term used to describe this partition elimination is pruning.

 

SQL: Rolling up or archiving transactions past a cutoff date in T-SQL

Another question that I recently saw on Stack Overflow was asking how you SELECT a bunch of transactions back past a cutoff date, add them up, INSERT a roll-up row, and then delete the old ones. The concern was with how you make sure the ones you are deleting are the same rows that you are rolling up (or totalling). Someone else might UPDATE or INSERT a row at the same time.

I see exactly the same issue with people wanting to archive rows off into another table. They SELECT the rows into the other table, then DELETE them. But how do they know they deleted the same rows.

The answer is the OUTPUT clause for the DELETE statement.

Output Clause

When you execute  a DELETE statement (or INSERT, or UPDATE, or MERGE, etc.), you can add an OUTPUT clause that returns details of what was inserted or deleted. The virtual tables are just like you have in triggers.

Example

So let's start with an example transactions table. I've kept it as simple as possible:

Then let's populate it with some sample data and query the data:

That leaves us rows like this:

Roll-up Time

Now I'll calculate an archive cutoff date, as the first day of a month, 2 months back.

I've added a table variable to "catch" the OUTPUT of the DELETE statement. The poster asked for a table lock so I added that, although it's not really needed or a good idea. (Consider removing it)

Then, within the same transaction, I just INSERT the roll-up row back into the same table.

And Archiving?

Archiving data from one table into another table is even easier. You just do the DELETE in one table, and OUTPUT INTO the other table. Then you know that the rows you deleted are the same ones that you are inserting.

I hope that helps someone.

 

SDU Tools: SQL Server SDU Tools Version

More and more users of SDU Tools are adding the deployment of the tools into their standard deployment pipelines, and into their standard operating environments (SOEs).

One common but simple request that we've had is that they need to be able to check which version of SDU Tools is currently installed. So we added a function to do that.

It doesn't take any parameters but just returns the version number as a string.

Find out more

You can see it in action in the main image above, and in the video here:

 

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

Power BI: Creating a TopoJSON file of Australian postcodes for use with Shape Map

In a recent consulting project, I needed to plot analytic values over postcodes. I couldn't just do it with the built-in maps control, mostly because the system that we're developing on doesn't have Internet connectivity. I was keen to upload the mapping data directly into the report, and the Shape Map control (while still in preview) seemed to fit what I needed exactly.

However, when you load custom maps into the control, they need to be in TopoJSON format. Now TopoJSON is a specific extension over GeoJSON which has been pretty common for a while. I tried just using a GeoJSON file with the control, to no luck.

What was successful was to download the postcode shapes from here:

https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.003July%202016?OpenDocument

I downloaded the 2011 shape file for Australian postcode areas.

Next I used the  website at https://mapshaper.org/ to do the following:

  • Loaded the shape file
  • Simplified the shape file (I reduced it to about 2% of the original size, while opting to keep any small areas)
  • Export the data as a TopoJSON file
  • (Optional) I then edited the TopoJSON file to rename my postcode column from POA_CODE to Postcode and from POA_NAME to PostcodeName.

What an awesome site mapshaper.org is !

Notes on Shape Map

To use the shape map control, you need to do this:

  • Make sure you have the Shape Map control (it's currently a preview feature that needs to be enabled)
  • Drag the column containing the postcode to the Location field.
  • Once you've done that, on the format page, you can choose to add a custom map. I imported my TopoJSON postcodes file
  • Drag the analytic value to the Color saturation field.
  • Set the background color and border color.

When I first did this, nothing worked for me. Eventually I worked out that my source data had integer postcodes but the ones in the file were just text fields. I added a computed column to my data, made sure I had text postcodes available, and used that column, then all was good.

I hope this helps someone.

 

 

T-SQL 101: #42 Comparing strings by using collations in SQL Server

Another important concept when you're working with strings is the concept of Collations. If I write code like the first line here:

The question is how would I know if a capital A is treated the same as a lower case A. The answer to that is "it depends upon the collation". Collations are sets of rules that determine how strings are compared. SQL Server supports lots of collations. You can see them all by executing:

If you've never done that, it's worth taking a look at the available options.

For most people, the most important part of the collation names, apart from the language, is the CI and AS parts in the example above. CI says "Case insensitive". If it was CS, it would have been "Case sensitive". This determines how capitals and lower letters are dealt with.

The AS is "Accent sensitive" and you can similarly have AI for "Access insensitive". This relates to accents like accents, graves, umlauts, etc. commonly seen in European languages.

Some collation names start with a SQL prefix. These are old Sybase/SQL Server collations that are retained for backwards compatibility. You should avoid them and use the Windows collations i.e. the ones without the SQL prefix.

You can set default collations at the database and column levels. (Note you cannot set them at the table level).

If the two values you are comparing use different collations, or if you want to use a different collation to the database's default collation, you can specify which one you want by using the COLLATE clause. You can see it in the code image above. In that example, you might have a case-sensitive database but you want to that one comparison in a case-insensitive way.

Temporary Tables

One common problem happens when your database has a different collation to the tempdb database, and you are comparing values between your database and some temporary object. When that happens, you'll see an error that says SQL Server can't resolve the collation conflict. In this case, you definitely need to override by using a COLLATE clause.

You could then specify a particular collation, or if you just want to default to what the database uses, you can say COLLATE DATABASE_DEFAULT without needing to know which collation the database uses.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Learning Mandarin: A little bit – yi dian vs you dian

The word (Diǎn) is particularly useful. It basically means "a dot" like made with a writing brush, and from that, it means "a little bit".

I previously discussed how northerners (and Beijing folk) put "r" sounds on the end of many words. This is another one. So they'd often use 点儿  (Diǎn er) which is pronounced a bit like "dee-arrr".

There are two basic ways that gets used though.

Yīdiǎn

One common use is  一点 (Yīdiǎn).  The first character basically means "one" and this pair of characters is commonly used for "a little".

Yǒudiǎn

Another common use is 有点 (Yǒudiǎn). The first character in this case means "have" and this pair of characters also tends to mean "a bit".

So then, which is used when?

Let's see some examples:

今天我有点忙。(Jīntiān wǒ yǒudiǎn máng.) This means "today I'm a little busy". When 有点 is put in front of adjectives like "busy", it implies more of a negative connotation. It's almost like "today I'm a little too busy".

一点 can't be put in front of adjectives but it can be put after them.

请开快一点。(Qǐng kāi kuài yīdiǎn.) is "please drive a bit faster".

When 一点 is put in front of a noun, it means "a little" as a quantity.

请喝一点水。(Qǐng hè yīdiǎn shuǐ.) is "please drink a little water".

Slightly Tricky Negative Rules

There are also some rules for negatives. The most common "not" words are (Bù) and (Méi). (We'll talk another day about how they differ). But if you are putting 点 in front of either, it should be 有点.

他有点不高兴。(Tā yǒudiǎn bù gāoxìng.) is "he is a little unhappy".

However, if the sentence has 都不 (Dōu bù) or 也不 (Yě bù), then 一点 should be used instead.

她一点都不喜欢这音乐。(Tā yīdiǎn dōu bù xǐhuān zhè yīnyuè.) is "she doesn't like this music at all".

Learning Mandarin

I'll write more soon on the best methods for learning Mandarin. If you want to get a taste for it in the meantime though, my current favorite site is iTalki, and my favorite teacher by far is Amy He. If you decide to try it, click here and it's cheaper for both you and me.

Snowflake for SQL Server Users – Part 13 – Programmable objects

Similar to SQL Server, Snowflake has ways of creating programmable objects. But the way they work, and the way they are created is quite different.

Functions

Functions are the most similar. You can create them in two ways:

  • Javascript
  • T-SQL

I like the idea that you can choose which language to write code in, and that they both end up extending the Snowflake SQL language pretty much the same. Some code is better written in T-SQL and other code is better written in a higher-level language.

Functions are called as expected, in place of an expression.

Both scalar functions and table-valued functions are supported. Scalar functions must return a value.

Stored Procedures

There is the ability to write stored procedures, but curiously, you can only do that in Javascript.

I have to say I've never worked with a SQL database engine before that supports stored procedures but won't let you write stored procedures in SQL. I think this is quite a shortcoming in the product.

Stored procedures are called using the CALL statement (not EXEC as in SQL Server). Another curious aspect is that even though the stored procedures support a return value, the syntax for calling stored procedures via CALL doesn't support retrieving a return value. I have to say, that's quite bizarre.

You can pass values back from stored procedures by using temporary tables. Or if the returned data is small enough, you might be able to stuff it into a variant data type object and return that.

Stored procedures can be nested.

Triggers

There is currently no concept of a trigger in Snowflake at this time. That means neither DML (INSERT/UPDATE/DELETE) triggers and DDL (CREATE/ALTER/DROP/LOGON) triggers.

As triggers are often a necessary evil in some applications, this again is a significant shortcoming of the product at this time.

 

 

SQL: Just like tables, IN clauses have no default order

In Stack Overflow, I saw a poster with a query like this:

And he was frustrated that he couldn't reliably get an ordered output like this:

The problem is that he was assuming that the IN clause had a specified order that was then obeyed by the SELECT statement.

It doesn't.

If you want a SELECT statement to return rows in a particular order, you need to use an ORDER BY clause, and, in this case, you need something to order the rows on in the first place.

I think the easiest way to do that is with a table variable that has an IDENTITY column like this:

I think that's about the simplest way to do it, particularly if there are a lot of values that could be in that IN clause.

Hope that helps someone.