Snowflake for SQL Server Users - Part 18 - Time travel

One really important concept in data warehouses is the idea of versioning. Imagine that I have customers broken into a set of business categories. And then I change which customers are in which categories. For so many transactional systems, if I run a sales report for last year, the customers would now appear in the current business categories, rather than the ones they were part of last year.

In Kimbal-based designs, that’s part of the rationale for what are called Slowly-Changing Dimensions (SCDs).

SQL Server Temporal Tables

SQL Server 2016 introduced temporal tables. Each temporal table has two components: one table that holds the current data, and another table that holds the history for the same data i.e. the previous versions of the rows. While the SQL Server implementation is interesting, in practice, I’ve found it lacking in many ways.

The biggest challenge is the lack of temporal joins. Even though I can view the contents of a table at a particular point in time, and I can view all the versions of rows from the table over a period of time, there are real challenges when you need to work across joins. There’s no way to get rows from one table, based upon the timeframe of the rows from another table. When I created the WideWorldImporters sample databases for SQL Server 2016, you’ll find a whole bunch of cursor-based code in the procedures that extract data changes. I really didn’t want to do that, but couldn’t see any other way to achieve it.

Snowflake and Time Travel

The standard mechanism for accessing historical data in Snowflake is to use what’s called Time Travel. The core things that Time Travel lets you do are:

Run queries to see the previous version of data at a given time
Create a clone of a table based upon a previous version of its data (you can also do this for schemas and databases)
Recover previous versions of objects that have been dropped

Because this historical data is available, there is even an UNDROP command that works for tables, schemas, and databases.

When you want to use historical versions of the data, in a SELECT statement (or in a CREATE CLONE statement), you can specify AT for a specific time, or BEFORE to exclude the given time. Then when you specify the time, you have three options:

Specify a specific time via the TIMESTAMP option
Specify a relative time from the current time by using the OFFSET option with a number of seconds
Specify a particular statement via a Query ID

The queries look like this:

select * 
from customers 
at(timestamp => 'Tue, 03 December 2019 06:20:00 +1000'::timestamp);

select * 
from customers 
at(offset => -60 * 20);

select * 
from customers 
before(statement => '4f5d0db9-105e-45ec-3434-b8f5b37c5726');

Query ID

This last option is a really curious one for SQL Server people. Snowflake keeps track of previous queries that it has executed. Each one is identified by a Query ID.

There are many things you can access via this Query ID like the query plan involved.

Retention of History

The retention period for Time Travel is based upon the edition of Snowflake that you are using.

All editions are able to use a retention period of one day
Enterprise Edition allows up to 90 days for permanent objects, but only one day for transient and temporary objects.

Also in Enterprise Edition, you can specify a specific period for specific objects by using the DATA_RETENTION_TIME_IN_DAYS parameter when you are creating the objects.

If you really don’t want to use Time Travel (and the space it will use), you can set DATA_RETENTION_TIME_IN_DAYS to 0 at the account level.

For an index to all posts in this series, see the first post here.

2019-12-06