Book Review – Make Your Data Speak – Alex Kolokolov

Over the last year, I've come to know Alex Kolokolov more, through involvement with his data visualization challenges. I was really pleased to see he'd written his first book Make Your Data Speak (Creating Actionable Data through Excel For Non-Technical Professionals).

Things I Liked

I really liked the conversational style of the book. It's all structured around  an approach of "Let's see how this happens by example". The tone was really refreshing and should be good to hold people's interest.

I also really loved the approach of starting with a (believable) mess and cleaning it up. The level of the book would probably work best for the "not really comfortable with pivot tables yet" audience but I could see it being useful for people who've worked with these types of problems before, and who have probably made all the mistakes that Alex talked about in the book. Alex did say it's for non-technical professionals and I think that's spot on.

Having QR codes for linking to sample code was a simple but nice addition.

I found myself chuckling a bit when reading the  colours and theming parts. When I see demonstrations of material like this, people often demonstrate truly awful colours, and I think "no-one would do that". Alex has chosen examples that are nasty but believable. I've seen worse in the field.

Similarly, the section on choosing visualizations was detailed and well-argued.

The book finishes with a section on improving data-driven culture in the organisation. That's a good way to end.

Things I'd Like to See Improved

Not much. It's great!

It's important to note that the book focusses on Excel. I did keep thinking about how I'd do it in Power BI instead, but there is a very, very big audience for Alex's take on how to tell data stories using Excel.

I was concerned about how the English would be, from a non-native speaker. Some was a little odd, but Alex has done an outstanding job. In fact, it's so much better than so many books I've recently read from native English speakers. I really did not notice errors apart from a few things that sounded grammatically odd to me. As an example, the chapter "Dashboard Assembling" really should be "Dashboard Assembly" or perhaps better "Assembling Dashboards". Some sentences like "Assembly according to the layout is faster and easier…" is strictly correct but sounded a little odd.

I'd have to say though that nothing really grated on me. And that's quite an achievement. I hope one day I can do the same in Mandarin, but I fear I'll be far short of the level of what Alex has achieved in English writing.

The Verdict?

If you need to learn to tell a story with data and using Excel, this would be a worthwhile addition to your library.

7 out of 10

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 2 – How does it work?

In the part 1 of this series, I discussed the positioning of Change Data Capture. In part 2, I want to cover how it works.

Log Reading

There are many ways that you can output details of changes that occur in data within SQL Server. Many of those methods require actions to occur at the time the data change is made. This can be problematic.

The first problem with this, is the performance impact on the application that's making the change. If I update a row in a table and there is part of the process that writes details of that change to some type of audit or tracking log, I've now increased the work that needs to happen in the context of the application that's making the change. Generally what this means, is that I've slowed the application down by at least doubling the work that needs to be performed. That might not be well-received.

The second potential problem is even nastier. What if the change tracking part of the work fails even though the original update works? If I've done the change tracking work in the context of the original update (particularly if it's done as part of an atomic process), by adding tracking I might have broken the original application. That certainly wouldn't be well-received.

So what to do?

The best answer seems to be to work with the transaction log that's already built into SQL Server. By default, it does have details of the changes that have been occurring to the data. It can be read asynchronously so delays in reading it mostly won't affect the original data changes at all (there are only rare exceptions to this). If the reading of the logs failed, the problem can be corrected and the reading can be restarted, all again without affecting the main updates that are occurring.

And that's what Change Data Capture does. It uses the same log reader agent that has been part of SQL Server for a very long time. Previously though, it was used for Transactional Replication. In fact if you use both Transactional Replication and Change Data Capture on the same SQL Server system, they share the same instance of the log reader. The SQL Server Agent is used to make them both work.

SQL Server Agent – Isn't that missing?

When we're working with Azure SQL Database, things are a bit different. Currently, we don't have any concept of Transactional Replication. That could change but right now, it's not there. So sharing the log reader isn't an issue.

But I also mentioned that with SQL Server, it was the SQL Server Agent that kicks off the log reading agent. And with Azure SQL Database, we don't have SQL Server Agent either !

The Azure SQL Database team have instead provided a scheduler that runs the log reader (called the capture), and also runs the required clean-up tasks. SQL Server had another agent to perform clean-up. This is all automated and requires no maintenance from the user.

Change Data Capture (CDC) Data Flow

The data flow with CDC is basically like the following:

CDC Data Flow

  1. The original app sends a change (insert, update, delete) to a table in the Azure SQL Database.
  2. The change is recorded in the transaction log.
  3. Some time later (usually not long though), the change is read by the capture process and stored in a change table.
  4. The Data Warehouse (DW) or target system or ETL system makes a call to a set of CDC functions to retrieve the changes.

Everything in the dotted box above is part of, and contained within, the Azure SQL Database.

Upcoming

In the next section, I'll show you the code that's required, and show you the changes that occur to the Azure SQL Database when you enable CDC.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

Opinion: Spinning Rust? Not so much

I've worked in this industry a long time now.

From 1981 to 1986, I worked as an engineer for Hewlett-Packard. Yep, I'm "have actually met both Bill Hewlett and Dave Packard" old. They'd retired when I met them and John Young was in charge, but they came to Palo Alto to a shareholders' meeting and I was there at the time.

Most of my work was on HP3000 commercial mini-computers. The level of engineering on those systems was unlike anything I'd ever seen either before or since. They were really quite superb.

And one thing that I worked on a lot, was hard drives. HP were constantly at the bleeding edge on really reliable drives for commercial systems.

So what's that got to do with spinning rust?

I keep hearing people talk about "spinning rust" when they're referring to spinning hard drives, often to separate them from SSDs. And it always sounds so very strange to me.

I worked on "spinning rust" and know full well what it was like. Originally they weren't hermetically sealed drives like today, so it was obvious to anyone working on them what was in them. Many were also removable so that made it even more obvious. But later after Winchester style drives became common, they all started to be sealed up, where users couldn't see  inside them.

Somewhere, I'm guessing around 1984 or 1985, the coating on the platters changed. The old oxide coatings (i.e. rust as most people know it) that were used gave way to magnetic coatings that resembled mirrors. And typically it was coated over a non-magnetic material, often aluminium(on older drives) or even ceramics. But most drives you'd come across today commonly use glass as the substrate. Toshiba pioneered that back in 1990.

So the disks in your spinning hard drives are super-shiny mirrors.

It'd make far more sense to refer to "spinning mirrors" than "spinning rust".

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 1 – Why?

I often need to capture the changes from one database into another. The most common reason is that I'm wanting to bring changes from a transactional system across into a data warehouse that's part of a BI setup.

So which technology is best to use for this?

That's not a trivial question to answer but here are some thoughts:

Replication?

Unfortunately, this one's not available for Azure SQL DB as yet. Azure SQL DB can be a subscriber in Transactional Replication. We often use it this way. If we have an on-premises SQL Server, one of our favourite ways to get data into the cloud is by using Transactional Replication. (If you need to get your head around Replication with SQL Server, just head to our course here).

There are many advantages to replication, including the lack of impact on the source system, however Azure SQL DB can't currently be a publisher, so it doesn't help here.

And other forms of replication aren't really useful here, or an available option. So if the source DB is an Azure SQL DB, we need to find something else.

Azure SQL Data Sync

Azure SQL Data Sync is an odd technology. It basically grew out of Merge Replication based ideas. It's not built on Merge Replication, but it's very similar in concept. It was in a preview state so long, and the team had so long since stopped posting information about it, that most of us never thought it would ever reach GA.

You create a setup similar to this:

The sync metadata lives in a DB in Azure, and a copy of the DB that you want to sync is created as an Azure SQL DB. The Azure Data Sync engine then synchronizes the data between the HUB and the other DBs. If any of the DBs are on-premises, then an on-premises agent does the work.

Azure Data Sync (like Merge Replication) is trigger-based. Triggers are used to capture the changes ready for synchronization.

I wasn't a fan of Merge, and I can't say I'm a great fan of Azure SQL Data Sync. While it's conceptually simple, you would not want to use it for anything except very low volume applications.

Change Tracking

Change Tracking is another technology that's come directly from SQL Server land. When it's enabled, a set of change tracking tables are created. As data is changed in the tables of interest, changes are recorded in the change tracking tables.

One positive aspect of Change Tracking is that it isn't based on triggers and it outperforms trigger-based solutions. There are two downsides:

  • The changes are written synchronously, and in the context of the transaction that writes the change to the tracked table. This can impact the performance of the changes to the tracked table i.e. usually two writes are happening for each one that would have happened.
  • You don't get to see all the changes, and not in the order that they happened. Change Tracking lets you know which rows have changed, based upon the table's primary key. You can also ask to have a summary of which columns were changed). This can be a challenge for dealing with referential integrity, and other issues.

Queues (and Service Broker)

Another interesting option is to write to a queue. With an on-premises SQL Server, we can use Service Broker. If you haven't seen Service Broker, it's a transacted queue that lives inside the database. (To learn about this, look here).

With SQL CLR code or with External Activation for Service Broker, we could write to other types of queue like RabbitMQ.

At the current time, Azure SQL Database doesn't currently support writing to external queues. However, I do expect to see this change, as so many people have voted to have this capability added.

Change Data Capture

Change Data Capture (CDC) is another technology direct from SQL Server land. CDC is based on reading changes from a database's transaction log.

When you use it with SQL Server, it shares the same transaction log reader that Transactional Replication (TR) does. If you enable either CDC or TR, a log reader is started. If you have both enabled, they use a single log reader.

A key upside of using a log reader is that it doesn't slow down the initial updates to the target table. The changes are read asynchronously, separately.

Until recently, though, you could not use CDC with Azure SQL Database. The log reader agent ran from within SQL Server Agent, and with Azure SQL Database, you didn't have a SQL Server Agent.

The product team have recently done the work to make CDC work with Azure SQL Database.  It is an interesting option for extracting changes from a database, so this is the first blog post in a series of posts about using CDC with Azure SQL Database. Links to other posts will be added here as they are available:

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

 

Cosmos Down Under podcast 6 with guest Vincent-Philippe Lauzon is now published!

I published a new Cosmos Down Under podcast today.

In the podcast, I've discussed the new integration between Azure Cosmos DB and Azure Data Explorer with the program manager for the feature: Vincent-Philippe Lauzon.

It's a great new capability and worth half an hour of your time to find out about it.

I hope you enjoy it. You'll find it, along with all Cosmos Down Under podcasts at: https://podcast.cosmosdownunder.com

SDU Tools v23 is now available for download

Just a heads-up that v23 of SDU Tools went out to SDU Insiders yesterday.

You can find details about SDU Tools here: https://sdutools.sqldownunder.com

In v23, we've added or enhanced the following:

  • ExcelSerialToDateTime – there's so much corporate data stored in Excel, and if you need to import it to SQL Server, you quickly find that dates and times are a challenge. I have a friend who says that the only data types in Excel are numbers and strings, and that everything else is an illusion. I think that's pretty accurate. This function takes one of the funky Excel decimal numbers (called a Serial) that represents a date and time, and converts it to a SQL Server datetime value.
  • DateTimeToExcelSerial – no prizes for guessing what this one does. It's the opposite of the above.
  • We've done a lot of updates to internal data within the tools, like all up to date SQL Server build numbers, etc.
  • DateDimensionColumns now has StartOfWeek and EndOfWeek
  • DateDimensionPeriodColumns now has IsSameWeek, DayOffset, MonthOffset, and WeekOffset

You can see a short video on the new Excel conversion functions here: https://youtu.be/hxmGtKjkcwk

I hope you find the updates useful.

Course Review: Habit-Building Bootcamp

A while back, I reviewed a book called Atomic Habits by James Clear. I mentioned at the time that I nearly didn't get past the first chapter as I thought it was going to be another pretty cheesy self-help book. But I ended up loving the book. James really made me think about habits in a way that I hadn't done so before.

And so I was interested when Luke and Phil from Mandarin Blueprint released a new online course called the Habit-Building Bootcamp.

I wasn't at all surprised that they acknowledge how many ideas they had from reading Atomic Habits. But they've taken it further in two ways:

  • They have extended the ideas in James' book
  • They have directly applied it to learning Mandarin.

This course is designed as a companion to their Mandarin Blueprint course, but what I like about it, is that it's a good general purpose course on habit building, even for those with no interest in learning Mandarin.

Phil also added two extra lessons on breaking bad habits. Doing that is often much harder than forming new habits in the first place.

If you are interested in changing your habits, you might well find this low-cost course useful.

SDU Podcast #83 with guest Davide Mauri released

I recorded the first of a series of podcasts for the SQL Server 2022 timeframe recently. Once again, it was great to chat with another old friend Davide Mauri. Davide is a Principal Product Manager for Azure SQL Database.

Note that the topic I chose for today was the new  sp_invoke_external_rest_endpoint system stored procedure, and it's currently only available for Azure SQL Database but I expect it will get ported to other flavours of SQL Server over time. Davide has been deeply involved with the development of this procedure.

Developers have wanted a better way to interact with external services from within Azure SQL Database and this procedure really helps with that. I feel it lets the database now integrate much more closely into the Azure ecosystem.

You'll find the podcast with Davide (and all our other podcasts) here: https://podcast.sqldownunder.com

Data Science summit 2022 – Warsaw (and Hybrid) – SQL Server 2022 T-SQL

I'm always excited when I can get involved in conferences with our Polish friends.

Coming up very soon is the Data Science Summit 2022: https://dssconf.pl/en/

For this summit, I'll be presenting a quick (around 40 minutes) session highlighting what's changed in T-SQL for SQL Server 2022. I'm always so glad to see T-SQL enhancements in SQL Server and SQL Server 2022 has more than what we've seen in other recent versions. There are a number of very important enhancements that will take a little while to get our heads around, on the best way to use them.

I've also seen the list of people presenting and the range of topics for the conference, and it really looks quite fascinating. There is content in Polish but the majority is in English so it's completely accessible for us English speakers.

I'd really love to see as many of you as possible attending, to support the Polish data community.

Cosmos Down Under podcast 5 – Charles Feddersen on the Azure Cosmos DB for PostgreSQL API

I've just finished publishing another show for the Cosmos Down Under Podcast. This time the guest is Charles Feddersen.

Charles (and ex-Melbourne local) is working for Microsoft in the Azure Cosmos DB team. His current focus is the PostgreSQL API that Azure Cosmos DB supports.

In this show, Charles discusses the state of the PostgreSQL market, how it's implemented in the Azure Cosmos DB for PostgreSQL API, how that compares to Microsoft's Flexible Server offering for PostgreSQL, distributed tables, scaling databases, and so much more.

I hope you enjoy it. You'll find it here: Cosmos Down Under Podcast.