Faster Power BI and Analysis Services Development with Automatic Data Subsets

If you have worked with larger tables in either Power BI or Analysis Services, you’ll know that developing against them is painful. Any time the data needs to be refreshed (which happens often), you spend a lot of time sitting around waiting for it to happen.
Now our buddy Chris Webb proposed a way of getting sampled data during development. That’s pretty good. And fellow MVP Marc Lelijveld also uses a similar method.
In both cases, they are using a TOP N inserted to limit the number of rows. But it’s never been what I want.
What I Really Want
First, I don’t really want a TOP N, as I generally want a specific range of dates. For example, while I might want all data when the model is in production, or I might have a starting date for that, I usually want just a specific subset of the data for development. I often want say, just the last two months.
Second, when I’m working with source control systems, I don’t want to be changing the BIM or PBIX files in any way at all, as they move between development and deployment. I don’t even want to use different parameters.
Ideally, I wish the development tools like PBI Desktop, Analysis Services Tabular Designer in Visual Studio, etc. automagically included an extra limiting predicate while I’m developing.
My Workaround
In the meantime, I’ve come up with what I think is a pretty good workaround. I’m making the views that I connect to, determine what to return, based on either the HOST_NAME() or APP_NAME() from the SQL Server connection. Let me show you.
First I’ll create two of the schemas that we commonly use:
IF NOT EXISTS (SELECT 1 FROM sys.schemas
WHERE [name] = N'DataModel')
BEGIN
DECLARE @SQL nvarchar(max) = N'
CREATE SCHEMA DataModel AUTHORIZATION dbo;';
EXEC (@SQL);
END;
GO
IF NOT EXISTS (SELECT 1 FROM sys.schemas
WHERE [name] = N'Analytics')
BEGIN
DECLARE @SQL nvarchar(max) = N'
CREATE SCHEMA Analytics AUTHORIZATION dbo;';
EXEC (@SQL);
END;
GO
(Another thing I really wish for: CREATE SCHEMA IF NOT EXISTS or CREATE OR ALTER SCHEMA).
Next I’ll create a table to hold a list of the hosts that I’m using for development:
DROP TABLE IF EXISTS DataModel.DevelopmentHosts;
GO
CREATE TABLE DataModel.DevelopmentHosts
(
DevelopmentHostID int IDENTITY(1,1)
CONSTRAINT PK_DataModel_DevelopmentHosts
PRIMARY KEY,
HostName sysname NOT NULL
);
GO
CREATE INDEX IX_DataModel_DevelopmentHosts_HostName
ON DataModel.DevelopmentHosts (HostName);
GO
Note: I often get questions about the data type sysname. It’s the data type for system objects, and is currently mapped to nvarchar(128).
Then I’ll create and populate a table that I’m pretending is part of my data model:
DROP TABLE IF EXISTS DataModel.Transactions;
GO
CREATE TABLE DataModel.Transactions
(
TransactionID bigint IDENTITY(1,1)
CONSTRAINT PK_DataModel_Transactions
PRIMARY KEY,
TransactionDate date NOT NULL,
TransactionAmount decimal(18,2) NOT NULL,
IsFinalised bit NOT NULL
);
GO
SET NOCOUNT ON;
DECLARE @DateCounter date = DATEADD(year, -1, SYSDATETIME());
WHILE @DateCounter < SYSDATETIME()
BEGIN
INSERT DataModel.Transactions
(
TransactionDate, TransactionAmount, IsFinalised
)
VALUES (@DateCounter, RAND(12) * 100.0, 0);
SET @DateCounter = DATEADD(day, 1, @DateCounter);
END;
GO
I’ve added a transaction for every day in the last year.
Next, I’ll create the type of analytic view that we often use:
CREATE OR ALTER VIEW Analytics.[Transaction]
AS
SELECT t.TransactionDate AS [Transaction Date],
t.TransactionAmount AS [Transaction Amount],
t.IsFinalised AS [Is Finalised]
FROM DataModel.Transactions AS t
-- normal start date for this table
WHERE t.TransactionDate >= '19900101';
GO
There’s a hard cutoff date for loading data (perhaps the start of sensible data) and if I query this, I see all 365 rows.
SELECT * FROM Analytics.[Transaction];
GO
And the Secret Sauce
The trick is to change the view so that it makes decisions based on working out if I’m in development or not:
CREATE OR ALTER VIEW Analytics.[Transaction]
AS
SELECT t.TransactionDate AS [Transaction Date],
t.TransactionAmount AS [Transaction Amount],
t.IsFinalised AS [Is Finalised]
FROM DataModel.Transactions AS t
WHERE t.TransactionDate >=
CASE WHEN EXISTS (SELECT 1 FROM DataModel.DevelopmentHosts
WHERE HostName = HOST_NAME())
-- development subset for this table
THEN DATEADD(month, -2, SYSDATETIME())
-- normal date range for this table
ELSE '19900101' -- normal date for this table
END;
GO
If the query isn’t coming from a development host, it’ll return all data since the same hard-coded start date (i.e. start of 1990). But if I’m on a development host, it’ll just return the last two months of data.
I’ll add my client to the list of development hosts:
INSERT DataModel.DevelopmentHosts
(
HostName
)
VALUES (HOST_NAME());
GO
And then query the view again:
SELECT * FROM Analytics.[Transaction];
GO
And now I see only 62 rows, without changing the code at all.
Back in Power BI or Analysis Services Tabular Designer, if I’m on my client, I see the subset, but on the server, I see all the data without changing the BIM or PBIX file at all.
What if Host Won’t Work for You
There might be situations where making the choice based upon the host name just won’t work for you.
In that case, I’d suggest checking the APP_NAME() function instead of the HOST_NAME() function, and having a list of apps that get development subsets instead of full data sets.
2020-02-28