Faster Power BI and Analysis Services Development with Automatic Data Subsets

If you have worked with larger tables in either Power BI or Analysis Services, you'll know that developing against them is painful. Any time the data needs to be refreshed (which happens often), you spend a lot of time sitting around waiting for it to happen.

Now our buddy Chris Webb proposed a way of getting sampled data during development. That's pretty good. And fellow MVP Marc Lelijveld also uses a similar method.

In both cases, they are using a TOP N inserted to limit the number of rows. But it's never been what I want.

What I Really Want

First, I don't really want a TOP N, as I generally want a specific range of dates. For example, while I might want all data when the model is in production, or I might have a starting date for that, I usually want just a specific subset of the data for development. I often want say, just the last two months.

Second, when I'm working with source control systems, I don't want to be changing the BIM or PBIX files in any way at all, as they move between development and deployment. I don't even want to use different parameters.

Ideally, I wish the development tools like PBI Desktop, Analysis Services Tabular Designer in Visual Studio, etc. automagically included an extra limiting predicate while I'm developing.

My Workaround

In the meantime, I've come up with what I think is a pretty good workaround. I'm making the views that I connect to, determine what to return, based on either the HOST_NAME() or APP_NAME() from the SQL Server connection. Let me show you.

First I'll create two of the schemas that we commonly use:

(Another thing I really wish for: CREATE SCHEMA IF NOT EXISTS or CREATE OR ALTER SCHEMA).

Next I'll create a table to hold a list of the hosts that I'm using for development:

Note: I often get questions about the data type sysname. It's the data type for system objects, and is currently mapped to nvarchar(128).

Then I'll create and populate a table that I'm pretending is part of my data model:

I've added a transaction for every day in the last year.

Next, I'll create the type of analytic view that we often use:

There's a hard cutoff date for loading data (perhaps the start of sensible data) and if I query this, I see all 365 rows.

And the Secret Sauce

The trick is to change the view so that it makes decisions based on working out if I'm in development or not:

If the query isn't coming from a development host, it'll return all data since the same hard-coded start date (i.e. start of 1990). But if I'm on a development host, it'll just return the last two months of data.

I'll add my client to the list of development hosts:

And then query the view again:

And now I see only 62 rows, without changing the code at all.

Back in Power BI or Analysis Services Tabular Designer, if I'm on my client, I see the subset, but on the server, I see all the data without changing the BIM or PBIX file at all.

What if Host Won't Work for You

There might be situations where making the choice based upon the host name just won't work for you.

In that case, I'd suggest checking the APP_NAME() function instead of the HOST_NAME() function, and having a list of apps that get development subsets instead of full data sets.

 

 

6 thoughts on “Faster Power BI and Analysis Services Development with Automatic Data Subsets”

  1. That is very clever Greg.

    I've wasted several two-hour periods waiting for a dev Power BI data source update. This would avoid that, leaving the full dataset loads to the refresh on the service.

    Thank you!

  2. This is good stuff. But I’ve accomplished the same with simple views on SQL, parameters in PBI (service and desktop), and dataflows.

    It is a well written piece… much was learned… 🙂

    1. Hi Bryan, the reason that I didn't go down that path is I want to neatly integrate with GitHub, DevOps, etc. and I want to be able to deploy the files unchanged, and without needing to change anything after deployment. I can't see any way that I can do that with parameters. What would make it better is if the tools themselves offered an option for adding say another query predicate while in design mode.

  3. Hi Greg,

    This looks very useful. I wish I could apply something similar to a dataset I'm combining from SharePoint folders that is > 2GB.

    Any ideas for that scenario?

    1. Hi Paul, no idea on that one sorry. The upside for T-SQL is that there are connection properties (like APP_NAME, HOST_NAME), etc. that we have access to.

Leave a Reply

Your email address will not be published. Required fields are marked *