Book Review: Azure Data Fundamentals

Book Review: Azure Data Fundamentals

I recently received an early release review copy of Azure Data Fundamentals: A Guide to DP-900 Certification and Beyond by Michael John Peña from my friends at O’Reilly.

Background

Over the years, I’ve taken a large number of Microsoft certification exams. I’ve also helped to write many of the exams. Therecd are people who’ve taken more than me, but not many. This book is designed to help you pass the DP-900 exam. I did take that exam long ago and thought it was a reasonable exam.

One thing that did surprise me with this exam was the breadth of topics that were covered. It’s not as hard as the AI-900 exam, but it’s not all that easy either. There are many things you need to understand.

Importantly though, if you’re expecting it to be like the older SQL Server exams, and focussed on that sort of content, you’d also be quite surprised. This is more about data in general, and in particular, about data in Azure.

Author

Michael John Peña is a Data and AI director at Playtime Solutions. Michael is a dual Microsoft MVP (Data & AI), and MCT, and a technical mentor.

He describes himself as being forever a student. I like that.

Content

The structure of the book is good.

It starts with a discussion on how data is represented in systems, and how data types can be used. Michael provides quite good coverage of structured vs unstructured data, and how semi-structured data is also common.

The 2nd chapter moves into discussing the different ways to then store the data. One thing I’d add to the discussion on CSVs is that even though they always seem easy to create and use, different implementations of them can be surprisingly inconsistent. Another area that I’d like to see called out in the discussion with parquet files, is their limited availability of data types. In the discussion on ACID properties, there is good content on Atomicity and Consistency, but I’d like to see more in there on Isolation and Durability. That seemed to be missing. However, there is more on these topics in chapter 3.

Chapter 3 provides a useful discussion on the different types of workloads that are used with data.

I liked the fact that Chapter 4 then spends some time discussing the different roles people can fill, when working with data. If you work in the industry, you might know the difference between a Data Engineer, a DBA, and a Data Analyst but this could be new territory for the typical audience for this book. Michael also discusses the most important types of tools that are used by people in each role. He also discusses how this changes with a move to cloud-based systems.

In chapter 5, Michael then heads into the most common type of organized data storage for businesses: the relational database. He discusses what makes them relational and how they are queried and used. When discussing primary keys, I would like to have seen a discussion on natural vs surrogate keys.

NOTE: I had some issues with the code examples in this chapter.

  • When you are writing queries for SQL Server, you should always use 2-part names for objects like tables. It would be great if the examples in this book were changed to do this.
  • The example index that’s provided isn’t a great index for the email address query shown.
  • The stored procedure doesn’t implement transactions correctly. It would be good to have seen XACT_ABORT in the procedure.

Chapter 6 gets into how this relates to SQL services offered by Azure. The only comment I’d add is that in the discussion on Azure SQL Database, it would be good to have some coverage about vCore vs DTU based provisioning.

Chapter 7 looks at Azure Storage. While I rarely use Azure File Storage and Table Storage, almost every project that I’m involved with, makes some use of Azure Blob Storage. The discussion on storage tiers should be updated though, as it covers, hot, cool, and archive tiers. Nowadays, there’s also a cold tier that needs to be considered. It might also have been good to add some discussion on SMB v3. It was good to see the discussion on considering Cosmos DB instead of Table storage in Azure Storage. That’s now the recommended option for key-value pairs.

I was pleased to see a chapter (8) on Azure Cosmos DB as it’s one of my favorite products, yet widely misunderstood. Also, as Azure regions get rolled out, it’s one of the first services that’s always available. That’s because much of Azure itself is based on it, so it’s a Ring 0 service. I was a bit puzzled to see Michael mention the five consistency levels in Cosmos DB, but then only really describe three of them. In the diagram that shows the different APIs, the new PostgreSQL API is missing.

The next section of the book in chapter 9 covers analytic systems, particularly with Microsoft Fabric but also Synapse, DataBricks, and more. With all the hype around data lakes, I was glad to see the warnings about not creating a data swamp. And while zones are called out, it would be good to have specific content that covers medallion architectures, given the required knowledge for this exam.

This is followed by a chapter (10) on real-time analytics which is one of my current passions. In this chapter, I noticed that AMQP and MQTT were called out but I’d say HTTP should have been in the same section. A suprising number of systems send events via HTTP in an unconnected way.

NOTE: I think there’s a significant omission with chapter 10. Given Microsoft’s current focus, I was surprised that most of the chapter wasn’t discussing Microsoft Fabric Real Time Intelligence. You know that’s what they really want you using now. In Microsoft land, the answer to almost every question now, is to use Microsoft Fabric (for better or worse).

It’s pointless to get all the data in place without using it, so the next chapter (11) looks at data visualization with Power BI. It certainly is our go-to visualization tool today. I suspect there really was a need for a separate chapter discussing semantic models, either in Azure Analysis Services (older) or Fabric Data Models (newer), then in this chapter, there could have been a discussion on DirectLake, etc.

The book finishes out with some discussion on governance, and a few comments on how to stay current with data-related topics. It would be good to see the code sample in this chapter have a dynamic date calculation rather than a hard-coded value.

Chapter key point summaries and pointers

One other thing I quite liked in this book were the end of chapter summaries of the key points you need to understand from the chapter, when you take the exam. This is great for letting you focus your studies, and particularly on areas that you don’t already feel comfortable with.

I also really liked the discussions on common misconceptions. There is a silly assumption in the industry that newer ways of doing things are always better than traditional ways of doing things. I liked the way that Michael called that out as just not true.

Another feature of the book that I really liked, is the inclusion of hints on what to look for in exam questions. These are shown in the context of the main discussion.

Summary

Overall, this is a very good book. Writing a book like this is a tough call, because the requirements for the exam can be updated every three months. Michael has done a good job. The main area that I’d like to see him improve are the code samples, and the addition of introductory content around Microsoft Fabric in the relevant areas. (And yes, I know there are separate Fabric certifications).

8 out of 10

2025-11-20