SQL: Understanding Change Data Capture for Azure SQL Database – Part 5 – Accessing CDC from another database

Computer saying change

This is the final part of a series on using Change Data Capture with Azure SQL Database, and looks at accessing the data from change data capture from another database.

Scenario

I often use Azure SQL Database as a data warehouse. It's a perfectly fine database for that. But the challenge with Azure SQL Database is always about how to access data in other databases.

Many people don't seem to realize that you can use External Tables (and External Data Sources) to read data in one Azure SQL Database, in another Azure SQL Database.

In an earlier post, I talked about how you can access the list of logins in the master database from within an Azure SQL Database. I'd seen far too many people who were told that wasn't possible.

In that post, I showed how to set up a database scoped credential, an external data source, and an external table, so I won't repeat that detail here. Please refer to that post for specifics.

With Change Data Capture (CDC), what I'm often needing to do is this:

Two databases with arrow linking them

I have a source database (probably an OLTP database) and have enabled CDC on some tables. But I want to drag the data changes across to a staging area in the data warehouse database.

I wish the product team had made this easy but, at present, it isn't.

The Challenge

I can create an external table in the data warehouse database, that points to a table or a view in the source database. What I want to access though, is a table-valued function like the one I described in part 3:

cdc.fn_cdc_get_net_changes_dbo_NewEmployees

But it requires parameters, most importantly a @FromLSN and a @ToLSN. But I can't pass parameters to an external table. (I wish I could).

So how do I create a way to read from these functions, when I can't pass parameters, and I don't want all the data since the beginning of time?

There are two ways to tackle this.

Approach 1

In the past, here's what I've been doing instead:

Step 1: Local table

I create a local table, let's call it DataLoad.SourceConfigurations, and it has at least two columns for FromLSN and ToLSN. (Note these will be binary columns).

Step 2: External Table in the Source Database

I then create an external table in the source database, that points to that table that I just created in the data warehouse database. Let's call it DW.SourceConfigurations.

Step 3: View in the Source Database

I next create a view in the source database that reads from the DW.SourceConfigurations external table, to get the FromLSN and ToLSN values and uses them to call the CDC table-valued function. Let's call the view cdc.get_required_changes. I'll leave it in the cdc schema along with the standard functions. (Obviously I could have put it elsewhere).

The beauty of this, is that I can now control what's returned by this cdc.get_required_changes view, by updating the values in the DataLoad.SourceConfigurations table in the data warehouse database.

Step 4: External Table in the Data Warehouse Database

The final setup step is then to create an external table, let's call it DataLoad.RequiredChanges, in the data warehouse database, that points to the cdc.get_required_changes view in the source database.

End Result

Once we've done this, we have a workable solution. I can process changes from the source database in the data warehouse database almost as though they were local CDC changes.

I just put the LSN values that I want into DataLoad.SourceConfigurations, and then read rows from DataLoad.RequiredChanges.

Approach 2

A more recent, and simpler approach is to wrap the call to the CDC functions in a stored procedure.

We can now call a remote stored procedure by using

sp_execute_remote

This option appeared well after the other options were available. We'll still need a local table to store where we are up to, but we can just pass the values directly to the stored procedure, which can make the SELECT statement against the CDC function.

I hope this helps someone get started with this trickier aspect of using CDC in Azure SQL Database.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

Leave a Reply

Your email address will not be published. Required fields are marked *