Snowflake for SQL Server users – Part 2 – Cloud First Design

In recent years, I've done a lot of work in software houses (Microsoft calls them ISVs or Independent Software Vendors). Many of these software houses have worked out that they won't be able to just keep selling their on-premises applications because their customers are asking for cloud-based solutions.

And more importantly, the customers want the software houses to manage the applications rather than themselves. So, many of the software houses start trying to turn their on-premises applications into Software as a Service (SaaS) applications.

And the outcome? Is almost always really, really poor.

Cloud First

The real problem is that those applications haven't been written with a cloud-first mentality. They really are often the "square peg in a round hole". What often happens is that the cloud-based versions of the applications have large numbers of limitations compared to the on-premises versions of the same applications, because recreating everything in the cloud (when there were things that weren't designed for the cloud) is often nearly impossible.

I'm a big fan of Azure SQL Database, and they've done a great job on it (way better than almost any other application), but it's still quite a distance short of where we already were with SQL Server on-premises. I wish the marketing for the product would focus on how what is there (i.e. a truly amazing product) but the discussion always seems to be around what's missing, and how it compares to the on-premises product. In fact, I'm sure the entire reason that the Managed Instance versions of Azure SQL Database appeared were to address some of the shortcomings.

If the first time you'd seen SQL Server was to see Azure SQL Database, you'd come away saying how amazing it is. But if you've come from the on-premises product, chances are that you might be already using something that isn't there.

Nice to have a blank slate

Even if you were brand new to Azure SQL Database though, you'd find aspects of how the product is designed that are based on thinking from decades ago, and were designed for systems that were available back then. It's very hard to make major changes when you're bound to trying to keep backwards compatibility.

One key advantage that the team building Snowflake had was a clean slate where they could design a product that targeted cloud provider based services under the covers, instead of on-premises physical devices and operating systems.

 

For an index to all posts in this series, see the first post here.

Snowflake for SQL Server users – Part 1 – Why Snowflake?

A few months back, I started noticing that many of our clients had started to mention Snowflake.

In recent years, I've been in lots of planning and architectural meetings where there was already a presumption that AWS was being used rather than Azure. I put that down to a great selling job by the AWS people who got corporate IT folk locked into large enterprise agreements early. And so no matter what the technical question is, the answer will be something that runs on AWS.

I still think that Azure offers a far stronger cloud story than AWS but I'm looking at the whole end-to-end cloud-based story, particularly with platform services. I just see a stronger, more secure, and better envisaged story in Azure.

Cloud Transformation

A big part of this issue is that many organizations that I go into say they're making a "cloud transformation", yet what they're actually doing, is just moving a whole lot of virtual machines into a cloud hosting center.

That's just re-hosting; it's not making a cloud transformation.

For cloud-hosting providers, it's also a no-win game. When all you're doing is re-hosting VMs, you're mostly competing on price, and that's a race to the bottom that no-one really wins.

Data Warehouses

One bright note on the horizon though is around data warehouses. Almost all these corporates seem to get the idea that they want a cloud data warehouse as a service, not just a bunch of re-hosted virtual machines.

For customers who've already made the decision to use AWS (before we come into the meeting), when they look around for cloud data warehouses, Snowflake is one that's often chosen quickly.

A bonus for me, is that it's now available on both AWS and on Azure.

I've been spending a lot of time lately digging into Snowflake, and in this series of posts, I'll look at Snowflake from an existing SQL Server user's perspective.

Posts in the series

Part 1 – Why Snowflake? 
Part 2 – Cloud First Design
Part 3 – Core Architecture
Part 4 – T-Shirt Sizing
Part 5 – Editions and Security
Part 6 – Role Based Security
Part 7 – Authentication
Part 8 – Case-Sensitivity 
Part 9 – Stages
Part 10 – File formats
Part 11 – GET, PUT, SnowSQL 
Part 12 – Parallelism when loading files 
Part 13 – Programmable objects
Part 14 – Internal storage and micropartitions
Part 15 – Table types 
Part 16 – Primary and Foreign Keys
Part 17 – Clustering Keys for Tables
Part 18 – Time Travel
Part 19 – Fail Safe
Part 20 – Encryption

And more coming…

 

 

 

Opinion: Do your bulk data manipulation in T-SQL, not in row operations in SSIS

I really love SQL Server Integration Services (SSIS). In fact, I wish Microsoft was giving it far more love than it is. So many of my clients use it, but unfortunately, not everyone uses it well, or for the right things.

One really common mistake is to perform lots of row-by-row logic within the SSIS packages. Let's take an example of loading a fact table in a data warehouse as an example:

To load a fact table in a data warehouse that's based on a star schema, you need to find the dimension (usually surrogate) key for each dimension that's associated with the fact.

The Wrong Way

Awesome image by David von Diemar

Now clearly you could use a Lookup task in SSIS to go and find the first dimension key, based upon the dimension business ID. If it's present, you add it to your data flow. If it's not present, perhaps you infer a new row, and add the new row's dimension key to the data flow.

Then you go and do it all again for the next dimension, and so on, and so on.

There are situations where this might make sense, but most of the time, it doesn't.

A Better Way

As an alternative, you could use SSIS to just orchestrate some T-SQL to do the same job:

  • Stage the fact data in a table that has the required columns, plus a nullable column for every dimension key
  • Perform a single update on that staging table, with left joins to look up all the dimension keys and record them when present
  • Perform a single query for each dimension where you need to infer new keys and update the staging table
  • Push the staged data into the target fact table.

The difference in performance is often very, very large.

I know that doing the T-SQL work requires a different skill set to what's required for the SSIS work, but whenever you're dealing with a large amount of data manipulation in SSIS, or now in ADF (Azure Data Factory), consider whether you should just be orchestrating T-SQL instead.

 

Power BI: Creating an IsWeekday function in Power Query M Language

I spend a lot of time doing consulting that involves Power BI. One of the core skills for working with Power BI is getting yourself comfortable with the M language that's used in Power Query.

I was asked the other day, how to create a function that determines if a date is a weekday or a weekend. That's actually quite easy. The Date.DayOfWeek function can be adapted to do that.

In Power Query, go to the Edit Queries option and choose to add a new query. (You could also start by duplicating an existing query). Then on the View menu, choose the Advanced Editor and replace any code that's there with this:

Save it and call the query IsWeekday.

Date.DayOfWeek returns the day number in the week, but you need to tell it which day your week starts on. In this case, because I wanted to have Saturday and Sunday as weekend days, I started the week on Monday. That means that Saturday is then day 5 and Sunday is day 6 because the first day is day 0. So then all I need to do is check for a value less than 5.

If you need to change the weekend to say Friday and Saturday, you'd just need to replace the Day.Monday constant in the script with Day.Sunday.

Testing the function

Testing the function is also easy. Once you save the code, you'll see a template for calling the function:

Select a date, click Invoke and see the result:

 

 

Book Review: Pro Power BI Architecture

One of my Kiwi buddies who specializes in Power BI is Reza Rad. I was pleased to see he had a set of eBooks now on Power BI but was especially happy to see he had a book called Pro Power BI Architecture.

There are lots of books around to discuss how to use Power BI but there's been a real lack of books on architecting solutions using Power BI. So if you want to learn to develop dashboards or reports, this isn't the book for you. Reza has other books for that.

I enjoyed reading the book and I liked the degree of coverage it gave to these topics.

If you are looking for ways to integrate Power BI into your solutions architecture, this book is a pretty good starting point.

What I was really hoping for though, was more info on administration. Mind you, the book doesn't claim to provide that. I keep getting asked about materials around administration issues. Perhaps that's another edition for Reza to consider. But the architects who need high level overviews of all the key topics should be pretty happy.

Bottom line?

I enjoyed reading this book, and it's a pretty easy read. Great for architects considering Power BI.

Greg's rating: 7 out of 10

Bring your reports and analytics to life. Learn SQL Server spatial now.

I love working with SQL Server spatial. Don't just use boring tables and charts. Spatial data can really help to bring your reports and analytics to life.  Don't assume spatial is just about maps. (It's not). Don't assume you don't have spatial data. (I'm sure you have addresses, locations, etc.)

We've been working hard lately on getting many of our popular in-person courses, converted to being available online and on-demand. We're really proud to now have SQL Server Spatial for Developers and DBAs available.

We wanted to give you an experience that's as close as possible to a classroom course, but right in your own location, and at a time that suits you.

It's a detailed course. It's not just a set of videos. There are hands-on labs that you can do (we supply the data), quizzes that you can complete, and once you're even more awesome, you can get a certificate of completion.

It's $195 USD but best of all, use the coupon GOSPATIAL at the checkout, and you can enrol for $95 USD. (Until the end of January) Just click on the enrol button (says $195), and then click Add Coupon on the next page.

Check it out here: https://training.sqldownunder.com/p/sqlserver-spatial-data

And check out all our courses here: https://training.sqldownunder.com

More courses are on the way very soon.

Business Intelligence: Success is about small starts leading to bigger things

I spend a lot of time on client sites and time and again, one of the mistakes that I see people making, is trying to start with large projects. I think one of my all time favorite quotes about IT is:

Any successful large IT system used to be a successful small IT system.

The next time you're thinking about creating a project that's going to have a big bang outcome, please remember this. The history of the industry is that it really is likely to be a big bang, and not in terms of being a big success like you'd hoped for.

A staggeringly high percentage of large IT projects fail.

This is even more important in business intelligence projects. The industry is littered with companies that have spent a fortune on BI projects, and almost no-one in those companies can, or do, use the outcomes. It's a sad but true statistic.

Asking users what they want, gathering a large amount of detail, then building it, testing it, and delivering it to them sounds good, but it almost never works. Unfortunately, it's how many larger organizations think projects need to be managed. They assume that creating an IT project is like building a bridge.

It's not.

The first problem is that the users don't know what they want. Don't blame them later because you think they didn't give you the right instructions. That's your fault for assuming they can describe exactly what they want. In many cases, until they see some of the outcomes, they won't start to understand what the system will do, and until then, they won't realize what they really need.

Second, no matter how clearly you document their requirements, it won't be what they need. One of my Kiwi buddies Dave Dustin was having a bad day once, and I remember he mentioned that he was going to just spend the day delivering exactly what people asked for. That was wonderful and beautifully insightful, because we all know that it would lead to a disaster. It's little surprise. They might have said "sales" but they really meant "profit", and so on.

Finally, the larger the project, the longer it will take to deliver, and by then, even if you'd done it perfectly, the users' needs will have changed, so it still won't be what they want.

When you're starting a BI project, I'd implore you to find a project that has these characteristics:

  • Small enough to be done in a couple of weeks at most
  • Based on mocked up data in front of the target users
  • Large enough to make a significant difference to someone's life or work
  • Targeted at someone who's important enough to be a champion for the rest of what you want to achieve
  • (Ideally) targeted at someone "who pays the bills"

Start by doing a great job, and building on that, once you have support.

SQL: And where did the SSRS Private Assemblies folder move to in VS2017?

Recently, I posted about the templates folder for SQL Server Reporting Services designer moving in Visual Studio 2017. You'll find that blog post here.

I've also now found that the Private Assemblies folder has moved too. I do wish these locations would be fixed, shared, and known rather than needing to play hide and seek when updates are made to the SSRS designer tooling.

The Private Assemblies folder was used to hold a copy of any .NET assemblies that are referenced in your SSRS projects. By using .NET assemblies, you can share logic that's used in reports. It's worth noting that wherever we can, we try to avoid using .NET assemblies for this, and wish that SSRS had a way of dealing with common code inclusions, but that's a topic for another day.

Generally we only use code in reports for things like formatting functions, because we think that reports should render the data, not really calculate the data. (That should be done by a procedure or function on the server).

One of the reasons that we avoid using .NET assemblies for shared logic in reports is that when you do that, you buy yourself deployment issues. Where will the assembly (DLL) live? You basically have two choices:

Install it in the GAC (global assembly cache) – we don't like this one as it pollutes the GAC with application-specific logic.

Install it in both the Report Server's bin directory, and also in the private assemblies area for the SSRS designer. It has to go in two places as the designer needs to be able to find it during design time, and the report server needs to be able to find it at run time.

What has recently moved though, is where the private assemblies folder is located. Previously it was here:

\Program Files (x86)\Microsoft Visual Studio 14.0
\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject

Now it's either here:

\Program Files (x86)\Microsoft Visual Studio\2017
\SQL\Common7\IDE\CommonExtensions\Microsoft\SSRS\

Or here:

\Program Files (x86)\Microsoft Visual Studio\2017
\Enterprise\Common7\IDE\CommonExtensions\Microsoft\SSRS\

Like last time, I presume the difference is because of how the Visual Studio shell has been installed ie: either by SSDT (the SQL one), or by Visual Studio (the Enterprise one).

 

 

SQL: Where did my Reporting Services templates disappear to in VS2017?

I'm always telling students that one of the first things you need to learn to do when working with SQL Server Reporting Services (SSRS) is to create report templates.

Now I really wish that SSRS had a real templating system, but what we have at present is just a folder that contains the items that we can create. For example, in an SSRS project, if I right-click Reports, and click Add, then New Item:

I see a list of templates to choose from:

Now what I really want to see is something like MyCompany_A4_Portrait ie: I should already have company templates set up for paper size and orientation, etc. Usually I'd have a template that already has a logo, a place for parameters to be displayed, a report title, etc.

Templates are just report files (ie: RDL files) but placed in a special folder on your system.

If you're starting with a blank page to create a report, you're doing it wrong.

But where do these templates come from? In the Visual Studio 2015 version of the tools, they were here:

C:\Program Files (x86)\Microsoft Visual Studio 14.0\
Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject

I found after installing the VS2017 version of the tools though, that they weren't there any more. Instead, so far I've found them in different locations on different machines. One was here:

C:\Program Files (x86)\Microsoft Visual Studio\2017\
Enterprise\Common7\IDE\CommonExtensions\Microsoft\
SSRS\ProjectItems\ReportProject

The other was here:

C:\Program Files (x86)\Microsoft Visual Studio\2017\
SQL\Common7\IDE\CommonExtensions\Microsoft\
SSRS\ProjectItems\ReportProject

At one of those locations, you'll find these:

Note the difference is that it either says SQL or it says Enterprise in the path. I presume this depends upon which version of the VS shell you have installed, either the version installed by SQL Server Data Tools, or the full version installed by Visual Studio.

Note: once you've added the file to the folder, you'll need to restart SSDT to be able to see it appear as a template. That's because they now seem to cache the list of templates on startup.

I wish this was less confusing, and I wish there was a standard shared folder that templates could live in ie: shared amongst a group of developers in a standard way.

 

Opinion: Which SQL Server columns should be nullable – no place for Magic values

In  a recent blog post, I wrote about the ANSI_NULLS option for tables and after that I had several more questions related to NULL values.

Note that I mentioned NULL values, not values of NULL, and not values equal to NULL. That's because "being NULL" is a state that a value is in, not itself a value. That's why our queries say IS NULL and not = NULL.

Now because of that, many developers see values that  are NULL as a pain in the neck as they have to be dealt with separately, unlike other values. This can lead to many poor design decisions but I want to start today with the decision to use magic values.

Some developers will just decide that their databases will never have values that are NULL.

The problem with designs like this is if they have a column that should be NULL (because no sensible value could go there), instead they tend to put a magic value there instead.

Image by Jez Timms
Image by Jez Timms

Magic values are values that are stored like any other value but are intended to mean something else.

The problem with magic values is that as soon as you use a real value to represent something that means something different, every piece of software that deals with the database, and anyone who queries the database directly, has to understand the substitution trick. These go wrong all the time. Inevitably, someone doesn't follow and these values have a way of leaking into the real world.

Here's a simple example:

Now we don't know where exactly this went wrong. It could be as simple as the UI not knowing how to display a non-existent value, or the value 1/01/001 was stored instead of the value being NULL. Either way, it shows nonsense. What it no doubt should say is never.

This is very common with dates. I work on systems all the time that assume that instead of NULL dates use 1/1/3000 or 1/1/9000, etc. Invariably these values leak out into the real world.

There is a wonderful story of a country where OCR machines had been introduced to read tax returns instead of humans and, by design, when they couldn't read the values, they inserted all nines. Pity the people writing the next layer of code didn't get the memo. The huge values made their way right into the country's budget calculations before being identified as wrong.

This is surprisingly common. Here's another example:

I'm guessing that they decided to store 9999 for wind speed when they didn't have one, and -999 for precipitation when the value was missing. Again, pity the UI designer didn't know that, or didn't test for that.

Magic values are just one common problem. Next week, we'll discuss another.