BI: DataWeek starting soon – don't miss it

I was really excited today to see details of the upcoming #dataweek. Anything with a name like that already has my interest.

In this case though, there's a #CreateData event that's happening as part of #dataweek. And the focus of that looks to be on Azure Synapse Analytics.

Azure Synapse Analytics

I know that many of my readers won't have spent much time with Azure Synapse Analytics. The product has grown out of Azure SQL Data Warehouse (note not Azure SQL Database) but it's now much more than just an upgraded Azure SQL DW.

It's a tool that combines data warehousing with Big Data analytics. So many of the early Big Data offerings struck me as just Big Hype. I wrote about that back in 2013. And early last year, I wrote about my concern that about how "modern" seems to be treated as a synonym for "better", but it certainly isn't. And Big Data was an example for me, in particular Hadoop.

My real concern was that most of these tools were only addressing a small part of what was needed.

Why Azure Synapse Analytics ?

One key advantage of Azure Synapse Analytics is that it tackles both the data warehousing and Big Data needs, in a single product. That's far more useful. And it's trying to do it without a lot of moving data around, as that introduces latency.

I spent the last two weeks on a DevOps project focussing on building and deploying Azure Synapse Analytics projects. I was impressed by how mature the tooling now was, compared to when I'd last looked at it.

If you're not across where this product is at, I'd encourage you to consider registering for #dataweek.

ADF: Time zone support in Data Factory – a Small Change but so Important

I work with a lot of technology on a daily basis, much of it amazing. But I still get excited when relatively small enhancements are made, and they make my life or development much better.

Timezone support in schedule triggers in Azure Data Factory is one of those changes.

Schedule Triggers

In Data Factory, the most common type of trigger that we use is a schedule trigger. It's used to run a pipeline at a specific time or series of times. But one of the most painful aspects of these triggers is that they didn't have time zone support.

That meant that for a town like Melbourne or Sydney where daylight savings time applied, I couldn't set a pipeline to just run at say 2AM every day, Melbourne time. Every six months, I'd have to go in and change the trigger times to achieve that.

I'd also have to endlessly convert between our local times and UTC as that's the time that would be entered.

If you only have a few triggers, that might be OK, but when you have a lot of them, well that's painful.

Design was Ready

When I looked at the JSON representation of the trigger, there was a time zone in there:

So I knew that whoever designed it meant for it to work. But ADF wouldn't support any other value than UTC.

Great News !

In the last few days, we got the great news that time zones are now supported in these triggers.

You can see the option now in the UI:

I can't tell you how pleased I am to see that there.

Thank you ADF team !

 

Power BI: 5 Minutes to "Wow" and for enterprises, what's next?

Power BI is amazing. And it's starting to appear all over the place. Many enterprises don't know what to make of it though. Some are scared that it'll be the "next Access" where stores of uncontrolled data end up all over the organization. Power BI's mantra of "5 minutes to Wow" is spot on. It's easy to be impressed. But enterprises are often struggling with "what comes next after that 5 minutes?"

We've been implementing many enterprise-level projects that use Power BI, and I'd love to spend time showing you what we do.

Amit Bansal and the DPS team in India are hosting a virtual summit later this year. There are three days of data-related conference presentations delivered 24 x 7. If you can't find something of real interest to you, and in your time zone, you aren't looking. And the price? At present it's $89 USD. Yes you read that right. And it includes a year of access to the recordings of all the normal conference sessions.

As part of their Data Platform Virtual Summit  event, there are also pre-cons and post-cons. Again they are amazing value.

I'm presenting a pre-con over two half days. At the current discount, it's $129 USD and if you use the discount code GREG@DPS, it's even cheaper. I'd love to see many of you attend. Let's talk about Power BI in the enterprise.

Now, this isn't a session about how to make great visuals, etc. I'll leave that to someone else. But if you want to really get into what makes a good underlying data model, how to integrate properly with security, how to structure projects, how we use Azure Analysis Services and/or Power BI Premium and more, this should be the right session for you.

You can book here.

Fix: Unexpected error in Analysis Services Power Query designer in Visual Studio 2019

I was editing using the Power Query editor in an Analysis Services project, hosted in Visual Studio 2019. When I tried to use "Add column by example", I received the error shown above:

Unexpected Error

Could not load file or assembly 'Microsoft.DataIntegration.TransformDataByExample, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad34e35' or one of its dependencies. The system cannot find the file specified.

The problem is that the assembly had not been deployed with the Analysis Services Designer.

Fortunately, the same assembly is used by Power BI Desktop. Because I had that installed on the same machine, I was able to copy the assembly:

Microsoft.DataIntegration.TransformDataByExample.dll

from the folder:

C:\Program Files\Microsoft Power BI Desktop\bin

to the same folder as the devenv.exe program from VS 2019:

C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\Common7\IDE\

If that path doesn't exist on your machine, it might be here instead if you're using the Community edition:

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\

After that, it was back working again. I hope that helps someone else.

BI: (FIX) Failed to save modifications to the server. Error returned – The key didn't match any rows in the table

I was working with Azure Analysis Services the other day, and was having an issue with the processing of a tabular data model. When I first tried to process, I was receiving an error that told me pretty clearly that the AS server couldn't connect to the SQL Database. Fair enough. Wasn't sure what had nuked those credentials but reset them.

Next, I started to get this error: Failed to save modifications to the server. Error returned: 'The key didn't match any rows in the table

This turned out to be another whole issue. I've seen this error before, and generally it relates to the deployed data model schema no longer matching the schema of the table or view that it's loading from.

So I checked thoroughly for that, to no avail. The schema matched perfectly.

Eventually I realised that although I'd fixed the credentials, I'd used the credentials that I had ADF connecting to the database as, not the credentials that AS was using to connect to the database.

And that was it.

AS could connect to the database, but it wasn't seeing the views that it was meant to be loading from. The user that AS was connecting with, had no permissions to see them.

I would have hoped this would lead to a message telling me that it couldn't find the source table or view, but it doesn't. It leads to: "The key didn't match any rows in the table".

Far from obvious to me, and I hope it helps someone else.

Faster Power BI and Analysis Services Development with Automatic Data Subsets

If you have worked with larger tables in either Power BI or Analysis Services, you'll know that developing against them is painful. Any time the data needs to be refreshed (which happens often), you spend a lot of time sitting around waiting for it to happen.

Now our buddy Chris Webb proposed a way of getting sampled data during development. That's pretty good. And fellow MVP Marc Lelijveld also uses a similar method.

In both cases, they are using a TOP N inserted to limit the number of rows. But it's never been what I want.

What I Really Want

First, I don't really want a TOP N, as I generally want a specific range of dates. For example, while I might want all data when the model is in production, or I might have a starting date for that, I usually want just a specific subset of the data for development. I often want say, just the last two months.

Second, when I'm working with source control systems, I don't want to be changing the BIM or PBIX files in any way at all, as they move between development and deployment. I don't even want to use different parameters.

Ideally, I wish the development tools like PBI Desktop, Analysis Services Tabular Designer in Visual Studio, etc. automagically included an extra limiting predicate while I'm developing.

My Workaround

In the meantime, I've come up with what I think is a pretty good workaround. I'm making the views that I connect to, determine what to return, based on either the HOST_NAME() or APP_NAME() from the SQL Server connection. Let me show you.

First I'll create two of the schemas that we commonly use:

(Another thing I really wish for: CREATE SCHEMA IF NOT EXISTS or CREATE OR ALTER SCHEMA).

Next I'll create a table to hold a list of the hosts that I'm using for development:

Note: I often get questions about the data type sysname. It's the data type for system objects, and is currently mapped to nvarchar(128).

Then I'll create and populate a table that I'm pretending is part of my data model:

I've added a transaction for every day in the last year.

Next, I'll create the type of analytic view that we often use:

There's a hard cutoff date for loading data (perhaps the start of sensible data) and if I query this, I see all 365 rows.

And the Secret Sauce

The trick is to change the view so that it makes decisions based on working out if I'm in development or not:

If the query isn't coming from a development host, it'll return all data since the same hard-coded start date (i.e. start of 1990). But if I'm on a development host, it'll just return the last two months of data.

I'll add my client to the list of development hosts:

And then query the view again:

And now I see only 62 rows, without changing the code at all.

Back in Power BI or Analysis Services Tabular Designer, if I'm on my client, I see the subset, but on the server, I see all the data without changing the BIM or PBIX file at all.

What if Host Won't Work for You

There might be situations where making the choice based upon the host name just won't work for you.

In that case, I'd suggest checking the APP_NAME() function instead of the HOST_NAME() function, and having a list of apps that get development subsets instead of full data sets.

 

 

ADF: Changing or removing your Git connection from an Azure Data Factory

I've been working a lot with ADF (Azure Data Factory) again lately. If you've started to work with ADF and aren't using Git in conjunction with it, you need to change that now.

ADF objects are also represented as JSON objects and lend themselves nicely to being stored in Git.

Another key advantage is that if you don't have Git connected, when you're working with ADF, you don't have a Save button. If you're part way through making changes, you can either Publish them or Discard All. When you have Git in place, you get an option to Save.

NOTE: If I'm working at a site that doesn't have Git connected, if I need to temporarily stop working on something, what I can do is get the JSON code for the pipeline (or whatever I'm working on), save it away manually, then discard my changes. At least I can then put it back again later. But that's basically what the Git option does for you anyway, so just use it.

Changing or Removing the Git Connection

Today was the first time I'd needed to change the Git connection across to another repository. And it took me a while to work out how. You'll notice that in the main editing window, there's no option to change it:

Now I think it should be there. But if not, it should be on the Properties page for the ADF instance. But it's not there either:

Now I think it should be in one of those two locations.

So where is it?

In one of the least obvious bits of UI ever created, it's tucked up in the top right corner of the Dashboard. That's the last place I would have looked. That page is usually useless to me. It's great if you've just started, as it has lots of links to tutorials, etc. but otherwise, not so interesting. But that's precisely where the option is:

I hope that helps someone.

 

BI: (Workaround) Changing partitioned tables in SSDT tabular designer

I was working with a client the other day and we had what we thought was a simple situation:

  • A table in an SSAS tabular model project
  • The table had two partitions
  • We needed to remove 6 columns from the table

So, we'd already removed the 6 columns from the SQL Server view that the tables were being loaded from. We just needed to update the tabular model.

Now for an unpartitioned table, that's easy. You open the table properties, click Design, wait for a moment, click below the query, then on the Query menu, click the option to Refresh. When you then click Import, you can then save the changes. We do that all the time.

However, the partitioned story was different.

  • If we tried to change the partitions, it would not let us save the changes because the partition differed from the table.
  • If we tried to change the table, it would not let us save the changes because the table differed from the partitions.

<SIGH>

There really doesn't seem to be any great way to do it using the standard GUI interface.

Official Method

The "official" way is to:

  • Remove all but one partition
  • Make the changes
  • Put the partitions back

That doesn't sound like much fun, particularly if you had a bunch of partitions.

Workaround

The workaround is to right-click the .bim file, then View Code, find the columns and carefully remove them from the JSON. When you save it, all is then happy.

Wish it wasn't so, but I hope that helps someone.

 

 

Opinion: SQL Server Reporting Services – The reports of my death…

SQL Server Reporting Services (SSRS) is a wonderful product, that's still really important, even though the product group in Microsoft hasn't been showing it much love lately.

I was at a client site the other week, and while we were using Power BI (PBI) for their dashboards and general visualizations, we were looking to use SSRS for part of the solution.

What fascinated me, is that when they talked to the local Microsoft field staff, they kept being told how SSRS was old technology, and asking why they'd look to use such old technology.

Worse, when I looked at what they were suggesting, it was Power BI Premium, combined with Azure Automation, which was launching Azure Functions, and/or using Power Automate.

Really?

Way too much Kool-aid drinking has been going on there.

Awesome image by Benjamin Voros
Awesome image by Benjamin Voros

Let's make something clear:

I love Power BI, but there are still many, many scenarios where SSRS is the best solution available today.

Power BI Premium isn't for everyone, and in a number of areas, it's still quite a distance short of where SSRS already is. For a start, quite a number of things that are commonly used in SSRS just aren't available in PBI Premium's implementation of Paginated Reports.

Some other things will no doubt come but until they do, it's also not the same development and maintenance story. (A good example is the current lack of shared data sources, report viewer controls, etc.)

Smaller Companies

At present, the pricing of Power BI Premium is simply too high for most smaller companies. We've been saying that since it first appeared. That might change, and alter the balance, but we can only consider the situation today.

And even if it does change, I also don't see any solution at all except SSRS for all the people currently running the Express or Standard editions of SQL Server.

Having paginated reports available in Power BI Pro would help a bit with this but it's still not the whole story.

Licenses

With SSRS, you have a single license for the report server. With Power BI, unless you're using Premium and/or embedded, you have a Pro license for everyone using it.

For some companies, this is a major difference. Many companies have far more report recipients than people who want to browse reports in a dashboard.

And even though the client was a tier-1 financial, only around 40 people needed access. There's a major difference in price between 40 Power BI Pro licenses and an SSRS license, and a Power BI Premium license.

Subscriptions

Data-driven subscriptions in SSRS are powerful. Do you have a whole lot of users who just need a pixel-perfect PDF of a report, or an Excel spreadsheet sent to them periodically (perhaps every Monday morning)? SSRS is perfect for that.

Do you need to send a pixel-perfect report to a whole lot of people, with different parameters for each? SSRS and data-driven subscriptions are perfect for that.

Summary

Power BI is wonderful and it's going in great directions. I'm really disappointed that the product group hasn't given SSRS much love in recent times, but there are many scenarios where it's still the best solution.

Don't discount it yet.

And if you find yourself proposing a convoluted set of services, just to start to achieve what SSRS already does, please think again. You owe it to your customers.

 

 

Power BI: (Workaround) Times disappear when datetime values are displayed in PBI tables

I'll start this post by mentioning that there's a general consensus that you should avoid columns in tabular data models that have both dates and times. That's largely because they don't compress well. However, sometimes you really do need to have both a date and a time in a single column.

For example, if you want to sort one column by the date and time, we have no option to sort one column by two other columns. And if you're going to create a composite column to get around that, you're really back in the same problem as storing a datetime anyway.

Displaying Values in a Table

But what really surprised me recently was that when I added a datetime column from my Live Query data model (source was Azure Analysis Services) into a table in Power BI, that the time values disappeared.

Only the date was shown.

I was quite puzzled about what was going on.

Why This Happens

Power BI has a limited range of data types. If you have a date, it's stored in a date data type. But if you have a datetime, it's also stored in a date data type.

So when you're designing a report, Power BI has to decide a default format for showing you a date data type element. And no major surprise, it just shows the date.

Fixing the Issue

To make the time values appear, you need to fix the issue back in the tabular data model, by adding a custom display format for the datetime column.

I can't say that I love this as a display format decision should really be made at the client application. I'm OK with the server supplying a default display format, but the client should be able to change it.

Unfortunately in Power BI today, you have no option to change the format of elements from Live Query data sources.

I hope that helps someone.