SQL: Understanding Change Data Capture for Azure SQL Database - Part 3 - Enabling and Using CDC

This is part 3 of a series on working with change data capture (CDC) in Azure SQL Database. This part discusses how to enable it and how to use it.
To show how this works, I have created an Azure SQL Database called CDCTest. I created it as a DTU-based database with a service level objective (SLO) of S3. I’ll discuss more about the licensing implications of CDC in part 4.
Connecting and Setup
I’ve connected to the database using SQL Server Management Studio (SSMS) and opened a query window to the new database. I’ve then executed the following to create the objects:
CREATE TABLE dbo.NewEmployees
(
NewEmployeeID int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
FullName nvarchar(100)
);
GO
If I check the entry in sys.databases by executing this command:
SELECT is_cdc_enabled FROM sys.databases WHERE name = N'CDCTest';
The output of the column is as shown:
Enabling CDC at the DB Level
So let’s start by enabling CDC for the database, by executing:
EXEC sys.sp_cdc_enable_db;
Checking sys.databases the same way again now shows it enabled:
That’s how we can check if it’s enabled at the DB level. But there are other changes that have occurred. First, there’s a new schema that’s been added:
And it contains a number of objects:
SELECT o.[name], type_desc
FROM sys.objects AS o
WHERE SCHEMA_NAME(o.schema_id) = N'cdc'
ORDER BY o.[name];
Enabling CDC at the Table Level
Next, let’s enable it at the table level and see what’s changed.
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo',
@source_name = 'NewEmployees',
@supports_net_changes = 1,
@role_name = NULL;
We did several things here. We said that the source table is called NewEmployees and that it lives in the dbo schema. We then said that this capture instance will support net changes. That means that as well as providing each individual change, we can ask CDC for the net affect of a set of changes. We’ll see that later. And finally, we told it that for now, only admins can access the data. Otherwise, we’d have had to provide a name for the role that contains the users who can query this.
We can then see that the table is enabled for CDC by querying sys.tables:
SELECT name, is_tracked_by_cdc FROM sys.tables;
But wait a minute. We only created one table right? What are these others? We can find them if we expand the tables in the database:
Note all these extra ones are system tables.
One other thing we did, even though we didn’t specify it, is to create a capture instance called dbo_NewEmployees. That’s the default name but we could have named it by using a parameter. Each table can have two capture instances to allow for handling schema changes and/or other requirements.
We can see the list of capture instances for a table by using this query:
SELECT *
FROM cdc.change_tables
WHERE source_object_id = OBJECT_ID('dbo.NewEmployees');
This shows the configuration of the capture instance, the start and end points of the log that it relates to, and more, like the name of the index in the table, etc.
Using CDC for the Table
Now let’s make use of CDC for this table. I’ll start by inserting three rows, then updating one of them.
INSERT dbo.NewEmployees
(
FullName
)
VALUES('Fred Jones'),('Steve Ballmer'),('Bill Gates');
UPDATE dbo.NewEmployees
SET FullName = 'Greg Low'
WHERE NewEmployeeID = 1;
GO
Now let’s check out what happened by calling the CDC functions:
DECLARE @From_LSN binary(10) = sys.fn_cdc_get_min_lsn(N'dbo_NewEmployees');
DECLARE @To_LSN binary(10) = sys.fn_cdc_get_max_lsn();
SELECT c.*
FROM cdc.fn_cdc_get_all_changes_dbo_NewEmployees
(@From_LSN, @To_LSN, 'all') AS c;
SELECT c.*
FROM cdc.fn_cdc_get_net_changes_dbo_NewEmployees
(@From_LSN, @To_LSN, 'all') AS c;
GO
When calling these functions, I needed to specify a range of log sequence numbers (LSNs). I called sys.fn_cdc_get_min_lsn to get the minimum value available to the capture instance (note: not to the table), and used sys.fn_cdc_get_max_len to get the maximum value that’s available from the log right now.
Normally, I’d want to record where I’m up to and get values past that point.
The output of the first SELECT using cdc.fn_cdc_get_all_changes_dbo_NewEmployees was:
Note that it shows every operation i.e. the three inserts (with __$operation as 2) and an update (with __$operation as 4). The primary key value is also shown, and the LSNs where it all occurred.
Also note that the name of the capture instance is part of the name of the function that was created when we enabled CDC for the table.
Often I do want every operation, but sometimes I don’t care about all the individual operations. I just want the net effect. That’s what’s shown in the second SELECT using cdc.fn_cdc_get_net_changes_dbo_NewEmployees:
This one is interesting. Inserting 3 rows, then updating 1 of those rows, is the same as inserting 3 rows but one of them with the updated value. And that’s what you get from this function.
And that’s the basic enabling and use of CDC in Azure SQL DB. We’ll move to licensing implications in the next part.
- Why use Change Data Capture for Azure SQL Database?
- How Change Data Capture works in Azure SQL Database
- Enabling and using Change Data Capture in Azure SQL Database
- Change Data Capture and Azure SQL Database Service Level Objectives
- Accessing Change Data Capture Data from Another Azure SQL Database
2023-06-02