Book Review: Data Cleaning with Power BI

Book Review: Data Cleaning with Power BI

I was excited to see Gus Frazer’s new book on Data Cleaning with Power BI. Our friends at PackT sent a copy for me to take a look at.

Gus has a background with both Power BI and Tableau, and it’s always interesting to see a mix of perspectives in any book. In this book, he shows a variety of data cleaning/cleansing tasks, then how to do many of them with M in Power Query. And you can tell Gus has a teaching background. because chapters have review questions. The book could well support a course.

Content I found interesting

In later chapters, he gets into fuzzy matching , fill down, and using R and Python scripts, along with how to use ML to clean data. Gus has added quite good coverage of how to create custom functions in M, and importantly, how to optimize queries in M.

The only comment that I think was missing, is that I try to avoid doing this work in M if I can. I’m a fan of pushing this work upstream wherever possible. Given the topic of the book, I know that’s a tall order, but it needs to be mentioned. Power Query and M aren’t the strongest contenders when you’re dealing with large amounts of data that needs to be manipulated. He did have a section on dealing with Big Data, and that’s where I’d most collide with using Power Query and M. The bigger the data, the less likely I’d be wanting to use these tools to do the work.

The book provides some info on data modeling, and why things like bidirectional cross filtering is such an issue. It was good to see that, given how much of my life I spend removing it from models built by clients.

I liked the coverage of calculation groups.

There was a section on preparing data for paginated reports, and then one on cleaning data using Power Automate. This is another area where I’d differ. The problem with Power Automate is the licensing model. I see far too many solutions built using it, that break when the author is no longer with the company. For anything that the organization depends upon, I’d be far more likely to use Logic Apps running under a service account, than Power Automate.

I really liked seeing a section on how OpenAI makes it easier.

Things I’d like to see

This is a good book but there are some things I’d like to see.

Screenshot images

The book is pretty good quality. What I did find hard was that many screenshots were simply too small. I know it’s hard to squash Power BI UI into a portrait-oriented book, but it needs further thought. Here’s an example of what I’m talking about:

Tiny writing in the advanced editor

That’s probably OK in a PDF version of an eBook where you can zoom in, but I struggle to imagine how that would appear in a printed book. I know it’s not easy, and it’s an issue I’ve run into myself when writing.

Calculated columns

While it’s important that calculated columns are shown, I really try to avoid them in these models, and wherever possible, I like to see that “pushed left” i.e., back to the source.

Data types

Some of the biggest issues with data transformation and cleansing relate to data types. Power Query and M are fairly poor at working with various data types. In fact, it’s a criticism I have of Power BI as a whole.  For example, whoever decided that “decimal” should really mean “float” so we end up with columns of numbers that don’t always add up, etc. simply made a poor decision in the design of the product.  These types of things, and other aspects of data types really need much stronger coverage in the book.

Summary

A good book, and well-written.

7/10

2024-06-10