ADF: Fix – The required Blob is missing – Azure Data Factory – Get Metadata

I was doing work for a client that uses Azure Data Factory (ADF), and noticed an unexpected Error code 2011 from a Get Metadata activity.

The pipeline was using a Get Metadata activity to read a list of files in a folder in Azure Storage, so that it could use a ForEach activity to process each of the files. Eventually it became clear that the error occurred when the folder was empty.

Storage Folders

Originally with Azure Storage, folders were only a virtual concept and didn't really exist. You couldn't have an empty folder.

So what we used to do back then is to place a file that we called Placeholder.txt into any folder as soon as we created it. That way the folder would stay in place. It's a similar deal to folders in Git i.e., you can't have a folder without a file.

It also meant that after using a Get Metadata activity to find the list of files in a folder, we'd have to use a Filter activity to exclude any files like our placeholder file that weren't going to get processed.

Hierarchical Namespaces

But that all changed with the introduction of hierarchical namespaces in Azure Storage. Once you enabled this for an account, folders were first class citizens and could exist whether or not they contained files.

This is a good thing as it also made access to files in a folder much faster.

It was part of the support that was needed to use Azure Data Lake Storage (ADLS) Gen 2.

So what was wrong?

The issue in this case, was that the perception was that the  placeholder file no longer needed to be there.

Everything else worked, but, by design, if you try to get the metadata for an empty folder, even in a hierarchical namespace, error 2011 is thrown.

That seemed really odd to me.

The fix is in

I talked about this to members of the product group and they gave me the details of what was going wrong.

Even though you can connect to Azure Storage using the Azure Storage Blob connector, even if you've enabled hierarchical namespaces, you'll still get an error if you try to list items in an empty folder. Can't say I love that, but it is what it is.

The trick is that you must use the Azure Data Lake Storage (ADLS) Gen 2 connector instead of the Azure Storage Blob connector. If you do that, you get back an empty ChildItems array, just as expected. No error is thrown.

Modifying existing code

Now, changing the type of connector is a bit painful as you also have to change the type of dataset, and that could mean unpicking it from wherever it's used and then hoping you set it all back up again ok.

Instead, this is what I've done:

  • Add a new linked service for ADLS2
  • Modify the JSON for the existing dataset to change the location property type from AzureBlobStorageLocation to AzureBlobFSLocation and change the referenceName to the new linked service
  • Delete the previous storage linked service.

And that should all just work without a bunch of reconfiguring.

I hope that helps someone.

 

 

 

 

SQL: Understanding Change Data Capture for Azure SQL Database – Part 5 – Accessing CDC from another database

This is the final part of a series on using Change Data Capture with Azure SQL Database, and looks at accessing the data from change data capture from another database.

Scenario

I often use Azure SQL Database as a data warehouse. It's a perfectly fine database for that. But the challenge with Azure SQL Database is always about how to access data in other databases.

Many people don't seem to realize that you can use External Tables (and External Data Sources) to read data in one Azure SQL Database, in another Azure SQL Database.

In an earlier post, I talked about how you can access the list of logins in the master database from within an Azure SQL Database. I'd seen far too many people who were told that wasn't possible.

In that post, I showed how to set up a database scoped credential, an external data source, and an external table, so I won't repeat that detail here. Please refer to that post for specifics.

With Change Data Capture (CDC), what I'm often needing to do is this:

Two databases with arrow linking them

I have a source database (probably an OLTP database) and have enabled CDC on some tables. But I want to drag the data changes across to a staging area in the data warehouse database.

I wish the product team had made this easy but, at present, it isn't.

The Challenge

I can create an external table in the data warehouse database, that points to a table or a view in the source database. What I want to access though, is a table-valued function like the one I described in part 3:

cdc.fn_cdc_get_net_changes_dbo_NewEmployees

But it requires parameters, most importantly a @FromLSN and a @ToLSN. But I can't pass parameters to an external table. (I wish I could).

So how do I create a way to read from these functions, when I can't pass parameters, and I don't want all the data since the beginning of time?

There are two ways to tackle this.

Approach 1

In the past, here's what I've been doing instead:

Step 1: Local table

I create a local table, let's call it DataLoad.SourceConfigurations, and it has at least two columns for FromLSN and ToLSN. (Note these will be binary columns).

Step 2: External Table in the Source Database

I then create an external table in the source database, that points to that table that I just created in the data warehouse database. Let's call it DW.SourceConfigurations.

Step 3: View in the Source Database

I next create a view in the source database that reads from the DW.SourceConfigurations external table, to get the FromLSN and ToLSN values and uses them to call the CDC table-valued function. Let's call the view cdc.get_required_changes. I'll leave it in the cdc schema along with the standard functions. (Obviously I could have put it elsewhere).

The beauty of this, is that I can now control what's returned by this cdc.get_required_changes view, by updating the values in the DataLoad.SourceConfigurations table in the data warehouse database.

Step 4: External Table in the Data Warehouse Database

The final setup step is then to create an external table, let's call it DataLoad.RequiredChanges, in the data warehouse database, that points to the cdc.get_required_changes view in the source database.

End Result

Once we've done this, we have a workable solution. I can process changes from the source database in the data warehouse database almost as though they were local CDC changes.

I just put the LSN values that I want into DataLoad.SourceConfigurations, and then read rows from DataLoad.RequiredChanges.

Approach 2

A more recent, and simpler approach is to wrap the call to the CDC functions in a stored procedure.

We can now call a remote stored procedure by using

sp_execute_remote

This option appeared well after the other options were available. We'll still need a local table to store where we are up to, but we can just pass the values directly to the stored procedure, which can make the SELECT statement against the CDC function.

I hope this helps someone get started with this trickier aspect of using CDC in Azure SQL Database.

  1. Why use Change Data Capture for Azure SQL Database?
  2. How Change Data Capture works in Azure SQL Database
  3. Enabling and using Change Data Capture in Azure SQL Database
  4. Change Data Capture and Azure SQL Database Service Level Objectives
  5. Accessing Change Data Capture Data from Another Azure SQL Database

Power BI Implementation Models for Enterprises Part 2: Cloud Native Clients

I've been writing a series on Power BI Implementation Models for Enterprises for https://tekkigurus.com.

Part 2 that covers what I consider Cloud Native Clients is now published:

https://www.tekkigurus.com/power-bi-implementation-models-for-enterprises-part-2-cloud-native-clients/

Enjoy !

Fabric Down Under show 6 with guest Paul Turley now available!

Once again, I had the great pleasure to record a Fabric Down Under podcast with a fellow long-term Microsoft Data Platform MVP. This time it was someone I have known for a long time: Paul Turley.

Paul is a director at 3Cloud and a Microsoft MVP. Paul has an amazing level of experience with business intelligence projects and has also worked with Microsoft Fabric since it was just a twinkle in Microsoft's eye.

In this show, I discuss Paul's experiences so far, with starting to move customer projects across to Microsoft Fabric. Now that the product reached General Availability a few months ago, this is very timely information.

You'll find this show, along with the previous shows at: https://podcast.fabricdownunder.com

SQL Server Locking, Blocking, and Deadlocks for Developers and DBAs Course Released

The latest online on-demand course that I've been working on is now out the door. SQL Server Locking, Blocking, and Deadlocks for Developers and DBAs is available at our site: SQL Server Locking, Blocking and Deadlocks for Developers and DBAs (sqldownunder.com)

It's designed to let you learn to find and fix locking issues, and to avoid them in the first place.

To celebrate the release, use coupon code LOCKINGRELEASE until January 20th to get 25% off the price.

Why this course?

I often get called to customer sites and they tell me that they have blocking issues. But what they call blocking issues isn't always accurate. They might mean locks that are held for a long time, or they might mean deadlocks. These are really common issues for customers.

  • Have you been working with SQL Server or Azure SQL Database and having issues with blocking and deadlocks?
  • Would you like to really understand how SQL Server manages transactions and locks?
  • Not sure what isolation levels are about, and which ones you should be using?
  • Don't know your app locks from your version stores?
  • Do you know the basics but think it's time to extend your knowledge?
  • You want to learn from an expert.

If any of these apply to you, this course is for you! And as well as detailed instruction, the course also offers optional practical exercises and quizzes to reinforce your learning.

SQL Server Locking, Blocking and Deadlocks for Developers and DBAs (sqldownunder.com)

Fabric Down Under show 5 with guest Reid Havens now available!

I had the great pleasure to record a Fabric Down Under podcast with Microsoft MVP Reid Havens the other day.

Reid is the founder of Havens Consulting Inc. and a Microsoft MVP, and a seasoned professional with a wealth of experience in technology, organizational management, and business analytics. Reid teaches Business Intelligence, reporting, and data visualization, and that's what I wanted to talk to him about.

Reid is the founder of Havens Consulting Inc. and a Microsoft MVP. Reid is  a seasoned professional with a wealth of experience in technology, organizational management, and business analytics. Reid has a Master's Degree in Organizational Development and a background in consulting for Fortune 10, 50, and 500 companies.

In addition to his corporate experience, Reid is also a highly sought-after instructor, teaching Business Intelligence, reporting, and data visualization, and that's what I wanted to talk to him about in the show.

You'll find this show, along with the previous shows at: https://podcast.fabricdownunder.com

Cosmos Down Under show 9 with guest Hugo Barona discussing migration is released

I had the great pleasure to get to also record another new Cosmos Down Under podcast last night. It's now edited and released.

Show 9 features Microsoft MVP Hugo Barona discussing his experiences in migrating clients to Azure Cosmos DB, and the lessons he's learned along the way.

I hope you enjoy it.

You'll find it here, along with previous episodes: https://cosmosdownunder.com

SQL Down Under show 84 with guest Rob Sewell discussing SQL Server command line utilities is now published

I know it's been a while, but there's a new SQL Down Under show published!

While I've been busy with https://cosmosdownunder.com and https://fabricdownunder.com, I haven't forgotten the SQL Down Under shows. It's time for some more SQL Server love.

I had the great pleasure yesterday to record a podcast with one of my UK based friends Rob Sewell. Rob's one of the rare breed that is both a Data Platform MVP and an Azure MVP.

In the show, we discuss command line utilities for SQL Server, including the latest go-sqlcmd offering. We also include a shout out to the upcoming SQL Bits conference.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

Data Science Summit (Poland) 2023 – Early Bird Discounts now

One of the conferences that I enjoy speaking at each year is the Data Science Summit that comes out of Warsaw.

Once again, the speaker lineup looks excellent, and there are early bird discounts available now. I'm always amazed at how low cost these Eastern European conferences are.

https://dssconf.pl/user.html?lang=en#!/login

Fabric Down Under show 4 with guest Will Thompson discussing Data Activator in Microsoft Fabric

I had the great pleasure to get to spend time today, talking with Will Thompson. We were discussing Data Activator in Microsoft Fabric. Data Activator has just entered public preview.

Will is a Program Manager at Microsoft, where he's working as part of the Microsoft Fabric team. During the show, Will discusses the two typical use cases for Data Activator: one that's about taking action on standard analytics, and one that's about working with streaming data.

You'll find the show here: https://fabricdownunder.com

I hope you find it useful.