Book Review: Technology Operating Models for Cloud and Edge

I was recently sent a copy of a new book by Ahilan Ponnusamy and Andreas Spanner, called Technology Operating Models for Cloud and Edge: Create your purpose-built distributed operating model for public, hybrid, multicloud, and edge.

It was interesting to read this book.

I spend more of my time on the Azure side of the fence than the AWS / RedHat side of things, so I have some pretty different opinions to the authors on many of the topics. I've have worked with multiple clouds, so what they were describing was familiar anyway.

I particularly liked their discussion near the beginning of the book that cover what often goes wrong in cloud moves. They talked about issues with:

  • Unclear direction-setting attempts like 'cloud first' left teams unsure of their future or what to do.
  • Moving to the Cloud is mistaken for innovation.
  • Goals to move large amounts of applications to the public cloud in an established enterprise without significant change management are not realistic.
  • High unanticipated bill shock.
  • Lift and shift shortcuts do not leverage Cloud on-demand scale-out/in features and, as a result, do not lead to the desired business demand-aligned pricing.

I regularly write articles on making real cloud transformations not just migrations. Too many people talk transformation, but end up doing migrations, and then enter the well of disappointment. And for their users, things are even worse.

So I liked to see their discussion warning against focusing on technology instead of transitioning people, processes, and culture to enable cloud ROI. And I had a chuckle when I read their recommendation to not start with logos that we want on our CVs. I can't tell you how often I've seen teams makign that mistake.

Robust, Antifragile, Application Lifecycles, and Data

I'm not 100% convinced by the Robust is out. Antifragile is in. discussion. I get why they're arguing that, but I see it differently. Similarly, I view application lifecycles now as very different to the:

  • innovate
  • operate
  • retire

options discussed.

Given my work focus, it's not entirely surprising that I differ a great deal with them in regards to how data should be handled today.

Edge Services and Hybrid Cloud

The authors must be seeing far more edge-based services than I commonly see in the market. They had an interesting discussion on that. Quite a large part of the book is dedicated to tackling edge services.

As for hybrid cloud arrangements, I'm quite pleased to see the lead Microsoft has taken in their Arc-enabled offerings.

Vendor Lock-In

Finally, I see their arguments around avoiding vendor lock-in. This is a topical one for me. I see people worried by this, and ending up with lousy outcomes.

It reminds me of application code that's written to avoid being locked into specific databases. Invariably, you end up with code that's not great with any database. And eventually, code gets added that's specific to one database. So then you have the worst of all possible worlds: poor code that's database agnostic, yet lock-in anyway because eventually, work just had to be done. And the kicker: I've never seen anyone who does this ever change databases anyway.

The same applies to multi-cloud thinking. What the authors don't really discuss, is that it's easy to end up with outcomes that work equally poorly on multiple platforms.

And it's a similar discussion in relation to committing to only use open source software. I understand the sentiment, and I've used many of the services that they mention in the book. I just haven't loved those services the way I do many of the options that aren't open source.

In today's world, your company can get run over by competitors who are quicker to market by using the best aspects offered by a single platform.

Summary

Regardless, I'm glad they've written on these topics. Even if you don't agree, it gives you a list of topics to consider.

7 out of 10

Contributing to Microsoft Learn Community Content

Microsoft Learn has a community content pilot running right now. They are looking to see how content from community members might provide different value to the content provided by their team members.

So, I decided that I should submit an article to see how it goes, and to find out how it works.

At present, it's all hosted in a GitHub repository, and no surprise, is all written in Markdown.

To contribute, you fork the respository, choose a template, and create your content from the template, following the rules contained in the template. You also need to add your content to the Table of Contents. Finally, you raise a pull request and a build agent checks that your content follows the rules. If the team are then happy, you get author access and can merge your content in. Eventually, your new article should appear at Microsoft Learn.

Here's the first article that I contributed:

https://learn.microsoft.com/en-us/community/content/remembering-the-users-in-cloud-transformation

As another example, here's another one from fellow MVP Luke Murray:

https://learn.microsoft.com/en-us/community/content/securing-backups-with-azure

I'd love to hear your thoughts on whether this style of content is useful, and if so, how it should be structured.

If you are interested in contributing, here's a link to get you started:

https://github.com/MicrosoftDocs/community-content/blob/main/CONTRIBUTING.md

You can also find more information in the Microsoft Learn Contributor Guide. It covers style, voice, etc.

https://learn.microsoft.com/en-us/contribute/content/

 

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 4 – Azure SQL Database Service Level Objectives

This is part 4 of a series on working with change data capture (CDC) in Azure SQL Database. This part discusses how to enable it and how to use it.

When I was reading the documentation for CDC in Azure SQL Database, I kept coming across a mention that it required at least an S3 service level objective (SLO), if you were using a DTU-based database.

I really hoped that wasn't the case.

I was left saying "please say it ain't so!".

Testing

I thought it was time to find out what actually happens when you try it.

I started by creating a database with the lowest SLO (Basic). Now I know these don't hold enough data to really even be very meaningful, but I wanted to know what would happen.

I didn't get far.

I created the table from the scripts in part 3 of this series, and then tried to do the first enable of CDC.

To say that's disappointing is an understatement. But i tmakes it clear that Basic, S0, S1, and S2 aren't going to let you enable it.

What about Scaling?

Then I wondered what would happen if I enabled it on an S3, and then tried to scale down.

So I scaled the DB to S3 ok, and enabled CDC as in part 3. All worked as expected.

I then scaled the DB back to S2 using the Azure Portal. And alas, that failed too:

I suppose I'm not surprised, but I had hoped it might at least have automatically disabled CDC if it really had to.

It's a Mistake

I think these licensing restrictions for CDC in Azure SQL DB are a mistake.

Unhappy woman
Unsplash image from Getty

Back in SQL Server 2016 SP1, I was one of the main instigators to get the product team to make sure that the programming surface of the database worked across all the tiers (Enterprise, Standard, Developer, Express) wherever possible.

This restriction on CDC goes directly against that ethos. It harks back again to large single database thinking, not cloudy thinking.

I understand that it's probably needed for performance in production scenarios, but what I'm always trying to tell the team is that production scenarios aren't the only scenarios.

Developers also need to write code. We should let them write and test functionality. It can be clear to them that it won't work the same as on a production instance, but that's no different to many other aspects of the system. A Basic SLO database won't run a complex query over a lot of data either, but we don't expect it to.

But it's important to let them write their code and make sure it basically functions, no matter what level of SLO they're working with. If you require them all to have an expensive database, just to test code or concepts, you greatly limit how many developers will use the features. 

Clouds
Unsplash image by Dallas Reedy

Having those types of restrictions also restricts the cloudiness of the solution, in that you couldn't scale up/down across a range of SLOs. We often take S7 or S8 databases and temporarily "park" them at S2 while not in use. This type of restriction kills those options as well, and makes the overall solution less cloudy.

And One More Part to This Series

That's service level objectives for the use of CDC in Azure SQL DB. We'll move to accessing CDC data from another Azure SQL DB next.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

Service Principal vs Service Principle -> not the same thing!

I can't tell you how many times lately, that I've seen people writing about service principles when they really mean service principals. These are entirely different concepts!

I did a certification exam the other day, and it was asking about service principles. I was left wondering how many people had reviewed that before it was released.

But what's worse, is when people bake the wrong one into code libraries that others need to use. I was reading some content from Microsoft Learn: Quickstart: Manage data with Azure Cosmos DB Spark 3 OLTP Connector for API for NoSQL and I saw this code example:

Note the authType. I just thought it was a typo and was going to make a pull request to fix it. But then I went looking at the library that it was using. That's Azure Cosmos DB OLTP Spark 3 connector, and what amazed me, is that the typo was right throughout the library.

I've passed about this back to the Cosmos DB team but it left me wondering what you should do when you see this sort of error in a library.

Do you just use the typo in your code and just forever propagate the mistake?

What should library authors do, to avoid breaking existing code? Perhaps allow both the correctly spelled option as well as the wrong one?

Service Principal

A service principal is a security identity used by applications, services, or automated processes to access specific resources or perform tasks within an Azure Active Directory (Azure AD) tenant. It's most commonly a type of service entity that can be assigned permissions to access resources in a secure manner. These are very common in Azure but they appear in many places across the IT industry.

Service Principle

A service principle really isn't something we talk about often but if we did, it would be perhaps a rule that we try to follow when providing service. Richard Koch wrote a famous book called The 80/20 Principle. That's often called The Pareto Principle. Either way, it's got nothing to do with an identity that performs actions in IT systems.

 

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 3 – Enabling and Using CDC

This is part 3 of a series on working with change data capture (CDC) in Azure SQL Database. This part discusses how to enable it and how to use it.

To show how this works, I have created an Azure SQL Database called CDCTest. I created it as a DTU-based database with a service level objective (SLO) of S3. I'll discuss more about the licensing implications of CDC in part 4.

Connecting and Setup

I've connected to the database using SQL Server Management Studio (SSMS) and opened a query window to the new database. I've then executed the following to create the objects:

If I check the entry in sys.databases by executing this command:

The output of the column is as shown:

CDC not enabled

Enabling CDC at the DB Level

So let's start by enabling CDC for the database, by executing:

Checking sys.databases the same way again now shows it enabled:

CDC is enabled

That's how we can check if it's enabled at the DB level. But there are other changes that have occurred. First, there's a new schema that's been added:

New CDC schema visible

And it contains a number of objects:

CDC schema objects

Enabling CDC at the Table Level

Next, let's enable it at the table level and see what's changed.

We did several things here. We said that the source table is called NewEmployees and that it lives in the dbo schema. We then said that this capture instance will support net changes. That means that as well as providing each individual change, we can ask CDC for the net affect of a set of changes. We'll see that later. And finally, we told it that for now, only admins can access the data. Otherwise, we'd have had to provide a name for the role that contains the users who can query this.

We can then see that the table is enabled for CDC by querying sys.tables:

sys.tables

But wait a minute. We only created one table right? What are these others? We can find them if we expand the tables in the database:

List of tables - more than expected

Note all these extra ones are system tables.

One other thing we did, even though we didn't specify it, is to create a capture instance called dbo_NewEmployees. That's the default name but we could have named it by using a parameter. Each table can have two capture instances to allow for handling schema changes and/or other requirements.

We can see the list of capture instances for a table by using this query:

capture instance details

This shows the configuration of the capture instance, the start and end points of the log that it relates to, and more, like the name of the index in the table, etc.

Using CDC for the Table

Now let's make use of CDC for this table. I'll start by inserting three rows, then updating one of them.

Now let's check out what happened by calling the CDC functions:

When calling these functions, I needed to specify a range of log sequence numbers (LSNs). I called sys.fn_cdc_get_min_lsn to get the minimum value available to the capture instance (note: not to the table), and used sys.fn_cdc_get_max_len to get the maximum value that's available from the log right now.

Normally, I'd want to record where I'm up to and get values past that point.

The output of the first SELECT using cdc.fn_cdc_get_all_changes_dbo_NewEmployees was:

Get all changes

Note that it shows every operation i.e. the three inserts (with __$operation as 2) and an update (with __$operation as 4).  The primary key value is also shown, and the LSNs where it all occurred.

Also note that the name of the capture instance is part of the name of the function that was created when we enabled CDC for the table.

Often I do want every operation, but sometimes I don't care about all the individual operations. I just want the net effect. That's what's shown in the second SELECT using cdc.fn_cdc_get_net_changes_dbo_NewEmployees:

Net effect

This one is interesting. Inserting 3 rows, then updating 1 of those rows, is the same as inserting 3 rows but one of them with the updated value. And that's what you get from this function.

And that's the basic enabling and use of CDC in Azure SQL DB. We'll move to licensing implications in the next part.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

Welcome Microsoft Fabric – Most significant change in Microsoft BI

Microsoft has had a strong lead in BI and analytics for a long time now. The introduction of the tabular data models in 2012 was a watershed moment.

Today, they have announced a private preview of Microsoft Fabric. It's a bigger deal than the tabular model was, and will set the direction of BI tooling for a long time to come.

The opposition was already struggling to keep up, and in fact, hadn't managed to do so. Microsoft Fabric will up the ante much, much higher than where we already were.

What is Fabric?

First up, it's a licensing model. Similar to the way you can license Office as a bundle rather than as individual products, it's a product that includes a set of best of breed tools.

But it's not just a licensing model.

Microsoft Fabric delivers a single integrated experience for working with analytics, with widespread integration between the tools, which are known as experiences. At release, these are the experiences:

Data Integration – will be very familiar to anyone who's worked with Azure Data Factory

Synapse Data Engineering – if you love running Spark or Python notebooks and more, this will make you feel happy

Synapse Data Warehousing – this involves both data warehouses, and lake houses, and provides an interface that's familiar to SQL devs.

Synapse Data Science – want to create and run machine learning models across the data? This is for you.

Synapse Real Time Analytics – need to deal with telemetry, logs, etc. and query them for analytics at volume? This is the tooling you need.

Power BI – is enhanced to work with the new storage mechanism. As well as the previous import and direct query modes, there is now a DirectLake mode.

And I'd expect to see more experiences coming.

One Lake

One of the most critical aspects of Microsoft Fabric is that data for all the experiences is stored in One Lake. This is basically an Azure Data Lake type of storage that is used to hold Delta-Parquet files, but again it's much more than just a storage account.

Having this single copy of the data in a single format, means you can access the same data from within all the different experiences. I can't stress how important this is.

And contrary to what happens with competitors, the file format is a standard open format using Delta-Parquet.

Get Involved !

Importantly, you can now try this too. You'll find free trials as part of the public preview at https://aka.ms/try-fabric. I'd encourage you to get involved.

For a detailed introduction, watch the on-demand session from Microsoft Build: https://build.microsoft.com/en-US/sessions/852ccf38-b07d-4ddc-a9fe-2e57bdaeb613?source=sessions

And make sure to subscribe to our new Fabric Down Under podcast!

Cosmos Down Under podcast 8 with guest Blaize Stewart is now published!

I recently noticed a blog post from Jay Gordon that was calling out interesting content from Blaize Stewart on antipatterns for development with Azure Cosmos DB.

I thought this would be a perfect topic for one of our Cosmos Down Under podcasts and Blaize agreed to take part.

So much content talks about what to do with a product, and even though everyone knows there are things you shouldn't do, very little content talks about these antipatterns. It was great to speak to Blaize about it, and to get his updated thoughts.

You'll find this (and all) shows here: https://podcast.cosmosdownunder.com

SQL: Understanding Change Data Capture for Azure SQL Database – Part 2 – How does it work?

In the part 1 of this series, I discussed the positioning of Change Data Capture. In part 2, I want to cover how it works.

Log Reading

There are many ways that you can output details of changes that occur in data within SQL Server. Many of those methods require actions to occur at the time the data change is made. This can be problematic.

The first problem with this, is the performance impact on the application that's making the change. If I update a row in a table and there is part of the process that writes details of that change to some type of audit or tracking log, I've now increased the work that needs to happen in the context of the application that's making the change. Generally what this means, is that I've slowed the application down by at least doubling the work that needs to be performed. That might not be well-received.

The second potential problem is even nastier. What if the change tracking part of the work fails even though the original update works? If I've done the change tracking work in the context of the original update (particularly if it's done as part of an atomic process), by adding tracking I might have broken the original application. That certainly wouldn't be well-received.

So what to do?

The best answer seems to be to work with the transaction log that's already built into SQL Server. By default, it does have details of the changes that have been occurring to the data. It can be read asynchronously so delays in reading it mostly won't affect the original data changes at all (there are only rare exceptions to this). If the reading of the logs failed, the problem can be corrected and the reading can be restarted, all again without affecting the main updates that are occurring.

And that's what Change Data Capture does. It uses the same log reader agent that has been part of SQL Server for a very long time. Previously though, it was used for Transactional Replication. In fact if you use both Transactional Replication and Change Data Capture on the same SQL Server system, they share the same instance of the log reader. The SQL Server Agent is used to make them both work.

SQL Server Agent – Isn't that missing?

When we're working with Azure SQL Database, things are a bit different. Currently, we don't have any concept of Transactional Replication. That could change but right now, it's not there. So sharing the log reader isn't an issue.

But I also mentioned that with SQL Server, it was the SQL Server Agent that kicks off the log reading agent. And with Azure SQL Database, we don't have SQL Server Agent either !

The Azure SQL Database team have instead provided a scheduler that runs the log reader (called the capture), and also runs the required clean-up tasks. SQL Server had another agent to perform clean-up. This is all automated and requires no maintenance from the user.

Change Data Capture (CDC) Data Flow

The data flow with CDC is basically like the following:

CDC Data Flow

  1. The original app sends a change (insert, update, delete) to a table in the Azure SQL Database.
  2. The change is recorded in the transaction log.
  3. Some time later (usually not long though), the change is read by the capture process and stored in a change table.
  4. The Data Warehouse (DW) or target system or ETL system makes a call to a set of CDC functions to retrieve the changes.

Everything in the dotted box above is part of, and contained within, the Azure SQL Database.

Upcoming

In the next section, I'll show you the code that's required, and show you the changes that occur to the Azure SQL Database when you enable CDC.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 1 – Why?

I often need to capture the changes from one database into another. The most common reason is that I'm wanting to bring changes from a transactional system across into a data warehouse that's part of a BI setup.

So which technology is best to use for this?

That's not a trivial question to answer but here are some thoughts:

Replication?

Unfortunately, this one's not available for Azure SQL DB as yet. Azure SQL DB can be a subscriber in Transactional Replication. We often use it this way. If we have an on-premises SQL Server, one of our favourite ways to get data into the cloud is by using Transactional Replication. (If you need to get your head around Replication with SQL Server, just head to our course here).

There are many advantages to replication, including the lack of impact on the source system, however Azure SQL DB can't currently be a publisher, so it doesn't help here.

And other forms of replication aren't really useful here, or an available option. So if the source DB is an Azure SQL DB, we need to find something else.

Azure SQL Data Sync

Azure SQL Data Sync is an odd technology. It basically grew out of Merge Replication based ideas. It's not built on Merge Replication, but it's very similar in concept. It was in a preview state so long, and the team had so long since stopped posting information about it, that most of us never thought it would ever reach GA.

You create a setup similar to this:

The sync metadata lives in a DB in Azure, and a copy of the DB that you want to sync is created as an Azure SQL DB. The Azure Data Sync engine then synchronizes the data between the HUB and the other DBs. If any of the DBs are on-premises, then an on-premises agent does the work.

Azure Data Sync (like Merge Replication) is trigger-based. Triggers are used to capture the changes ready for synchronization.

I wasn't a fan of Merge, and I can't say I'm a great fan of Azure SQL Data Sync. While it's conceptually simple, you would not want to use it for anything except very low volume applications.

Change Tracking

Change Tracking is another technology that's come directly from SQL Server land. When it's enabled, a set of change tracking tables are created. As data is changed in the tables of interest, changes are recorded in the change tracking tables.

One positive aspect of Change Tracking is that it isn't based on triggers and it outperforms trigger-based solutions. There are two downsides:

  • The changes are written synchronously, and in the context of the transaction that writes the change to the tracked table. This can impact the performance of the changes to the tracked table i.e. usually two writes are happening for each one that would have happened.
  • You don't get to see all the changes, and not in the order that they happened. Change Tracking lets you know which rows have changed, based upon the table's primary key. You can also ask to have a summary of which columns were changed). This can be a challenge for dealing with referential integrity, and other issues.

Queues (and Service Broker)

Another interesting option is to write to a queue. With an on-premises SQL Server, we can use Service Broker. If you haven't seen Service Broker, it's a transacted queue that lives inside the database. (To learn about this, look here).

With SQL CLR code or with External Activation for Service Broker, we could write to other types of queue like RabbitMQ.

At the current time, Azure SQL Database doesn't currently support writing to external queues. However, I do expect to see this change, as so many people have voted to have this capability added.

Change Data Capture

Change Data Capture (CDC) is another technology direct from SQL Server land. CDC is based on reading changes from a database's transaction log.

When you use it with SQL Server, it shares the same transaction log reader that Transactional Replication (TR) does. If you enable either CDC or TR, a log reader is started. If you have both enabled, they use a single log reader.

A key upside of using a log reader is that it doesn't slow down the initial updates to the target table. The changes are read asynchronously, separately.

Until recently, though, you could not use CDC with Azure SQL Database. The log reader agent ran from within SQL Server Agent, and with Azure SQL Database, you didn't have a SQL Server Agent.

The product team have recently done the work to make CDC work with Azure SQL Database.  It is an interesting option for extracting changes from a database, so this is the first blog post in a series of posts about using CDC with Azure SQL Database. Links to other posts will be added here as they are available:

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

 

 

Cosmos Down Under podcast 6 with guest Vincent-Philippe Lauzon is now published!

I published a new Cosmos Down Under podcast today.

In the podcast, I've discussed the new integration between Azure Cosmos DB and Azure Data Explorer with the program manager for the feature: Vincent-Philippe Lauzon.

It's a great new capability and worth half an hour of your time to find out about it.

I hope you enjoy it. You'll find it, along with all Cosmos Down Under podcasts at: https://podcast.cosmosdownunder.com