Book Review: Data Cleaning with Power BI

I was excited to see Gus Frazer's new book on Data Cleaning with Power BI. Our friends at PackT sent a copy for me to take a look at.

Gus has a background with both Power BI and Tableau, and it's always interesting to see a mix of perspectives in any book. In this book, he shows a variety of data cleaning/cleansing tasks, then how to do many of them with M in Power Query. And you can tell Gus has a teaching background. because chapters have review questions. The book could well support a course.

Content I found interesting

In later chapters, he gets into fuzzy matching , fill down, and using R and Python scripts, along with how to use ML to clean data. Gus has added quite good coverage of how to create custom functions in M, and importantly, how to optimize queries in M.

The only comment that I think was missing, is that I try to avoid doing this work in M if I can. I'm a fan of pushing this work upstream wherever possible. Given the topic of the book, I know that's a tall order, but it needs to be mentioned. Power Query and M aren't the strongest contenders when you're dealing with large amounts of data that needs to be manipulated. He did have a section on dealing with Big Data, and that's where I'd most collide with using Power Query and M. The bigger the data, the less likely I'd be wanting to use these tools to do the work.

The book provides some info on data modeling, and why things like bidirectional cross filtering is such an issue. It was good to see that, given how much of my life I spend removing it from models built by clients.

I liked the coverage of calculation groups.

There was a section on preparing data for paginated reports, and then one on cleaning data using Power Automate. This is another area where I'd differ. The problem with Power Automate is the licensing model. I see far too many solutions built using it, that break when the author is no longer with the company. For anything that the organization depends upon, I'd be far more likely to use Logic Apps running under a service account, than Power Automate.

I really liked seeing a section on how OpenAI makes it easier.

Things I'd like to see

This is a good book but there are some things I'd like to see.

Screenshot images

The book is pretty good quality. What I did find hard was that many screenshots were simply too small. I know it's hard to squash Power BI UI into a portrait-oriented book, but it needs further thought. Here's an example of what I'm talking about:

Tiny writing in the advanced editor

That's probably OK in a PDF version of an eBook where you can zoom in, but I struggle to imagine how that would appear in a printed book. I know it's not easy, and it's an issue I've run into myself when writing.

Calculated columns

While it's important that calculated columns are shown, I really try to avoid them in these models, and wherever possible, I like to see that "pushed left" i.e., back to the source.

Data types

Some of the biggest issues with data transformation and cleansing relate to data types. Power Query and M are fairly poor at working with various data types. In fact, it's a criticism I have of Power BI as a whole.  For example, whoever decided that "decimal" should really mean "float" so we end up with columns of numbers that don't always add up, etc. simply made a poor decision in the design of the product.  These types of things, and other aspects of data types really need much stronger coverage in the book.


A good book, and well-written.



Book Review: Extending Power BI with Python and R

I've seen a few books lately from the PackT people. The latest was the second edition of Extending Power BI with Python and R by Luca Zavarella: Perform advanced analysis using the power of analytical languages.


Luca Zavarella

The author is Luca Zavarella. I've been working with Power BI since before it was released, and ever since I've seen discussions around using R (initially) and Python (later), Luca has been one of those people that everyone listens to.

Luca is a fellow Microsoft MVP and has a great background in data science. In recent times, he seems to have been focussing on advanced analytics, data science, and AI. He's been prolific in these areas.

The Book

This is another very comprehensive book and took quite a while to read. It will be a vital reference for anyone trying to apply Python and/or R in Power BI. I've heard many argue that this is a pretty niche topic but I can't say I agree. I deal with a real mixture of clients and while not everyone does this, quite a few do. That of course particularly applies to anyone from a data science background.

In the book, Luca's experience shines through.

The real power of the book is the way that Luca shows you how to do things with Python and/or R that most people working in Power BI without them, would think were impossible.

In the DBA world, people often talk about the "accidental DBA" i.e., that's someone who wasn't planning to be a DBA but ended up being the one doing the tasks. I can well imagine that if you are the "accidental data scientist" working with analytics, there's a great amount to learn from this book.

I could also imagine this book being useful to people who use other analytic products, not just Power BI. because Luca explains the theory, not just the practical application.

And even though I've worked with Power BI from the start, I found a few interesting things in the book about Power BI, not just about using Python and R with it. It's important to always just keep learning, and every time I read something about a product that I think I already know well, I invariably learn something anyway. That's often because everyone uses the products differently, or applies them to problems that you hadn't considered.


Another great book. And another one where I can't imagine how long it must have taken to write. And the second edition adds great value.

It's not one for all my usual data audience, but anyone with an interest in applying advanced analytics in Power BI should check it out. You will learn a lot.

9 out of 10


Power BI Implementation Models for Enterprises Part 3: Cloud Friendly Clients

I've been writing a series on Power BI Implementation Models for Enterprises for

Part 3 that covers what I consider Cloud Friendly Clients is now published:

Enjoy !

Fabric Down Under show 7 with guest Philip Seamark now available!

Once again, I had the great pleasure to record a Fabric Down Under podcast. This time it was with a fellow "Down Under" guest Philip Seamark, from across the "ditch" (as we both call it) in New Zealand.

Phil is a member of the Fabric Customer Advisory Team and works as a DAX and Data modelling specialist.

He gets involved when enterprise customers need deeper technical support.

In this show, I discuss Phil's thoughts on Direct Lake which is one of the very new options that came with Microsoft Fabric. It adds another mode to Power BI, in addition to Import and Direct Query modes that have been there previously.

You'll find this show, along with the previous shows at:

Power BI Implementation Models for Enterprises Part 2: Cloud Native Clients

I've been writing a series on Power BI Implementation Models for Enterprises for

Part 2 that covers what I consider Cloud Native Clients is now published:

Enjoy !

Fabric Down Under show 6 with guest Paul Turley now available!

Once again, I had the great pleasure to record a Fabric Down Under podcast with a fellow long-term Microsoft Data Platform MVP. This time it was someone I have known for a long time: Paul Turley.

Paul is a director at 3Cloud and a Microsoft MVP. Paul has an amazing level of experience with business intelligence projects and has also worked with Microsoft Fabric since it was just a twinkle in Microsoft's eye.

In this show, I discuss Paul's experiences so far, with starting to move customer projects across to Microsoft Fabric. Now that the product reached General Availability a few months ago, this is very timely information.

You'll find this show, along with the previous shows at:

Fabric Down Under show 5 with guest Reid Havens now available!

I had the great pleasure to record a Fabric Down Under podcast with Microsoft MVP Reid Havens the other day.

Reid is the founder of Havens Consulting Inc. and a Microsoft MVP, and a seasoned professional with a wealth of experience in technology, organizational management, and business analytics. Reid teaches Business Intelligence, reporting, and data visualization, and that's what I wanted to talk to him about.

Reid is the founder of Havens Consulting Inc. and a Microsoft MVP. Reid is  a seasoned professional with a wealth of experience in technology, organizational management, and business analytics. Reid has a Master's Degree in Organizational Development and a background in consulting for Fortune 10, 50, and 500 companies.

In addition to his corporate experience, Reid is also a highly sought-after instructor, teaching Business Intelligence, reporting, and data visualization, and that's what I wanted to talk to him about in the show.

You'll find this show, along with the previous shows at:

Data Science Summit (Poland) 2023 – Early Bird Discounts now

One of the conferences that I enjoy speaking at each year is the Data Science Summit that comes out of Warsaw.

Once again, the speaker lineup looks excellent, and there are early bird discounts available now. I'm always amazed at how low cost these Eastern European conferences are.!/login

Book Review: Pro Power BI Architecture

I was pleased to see Reza Rad's latest book Pro Power BI Architecture: Development, Deployment, Sharing, and Security for Microsoft Power BI Solutions: Rad, Reza: 9781484295373: Books now out the door. Reza is an old friend, fellow Data Platform MVP, and fellow member of the Microsoft Regional Director program.

I was pleased to have been a technical reviewer for this book, and I hope that, along with the other reviewers, we have improved what was already a good book.

Not Just an Update

This is version 2 of the book that Reza produced in 2019 but it is not a book with minor updates. Most of the book is rewritten.

Reza is a book-writing machine. In this book, he has covered so very many aspects of architecture for Power BI. He has provided an emphasis on reliability and ease of maintenance. In particular, I was pleased to see a discussion on environments as that's often omitted in Power BI related books. And a discussion on how to save money by using the right licensing. Once again, that's a topic that's often not discussed.

Target Audience and Style

The book is really for anyone who needs to build Power BI reports (often analysts and developers), but with a view to the bigger picture of how to structure a project so it continues to be usable as it (or the team) grows in size.

I like Reza's conversational style and it shines out in his writing. When I read it, I often feel like I'm sitting in a room listening to him talk. That's a tough skill and Reza does it effortlessly.


Great book and well-written

8 out of 10


Fabric Down Under show 2 with guest Josh Caplan discussing OneLake

I had the great pleasure to get to spend time today, talking with Josh Caplan about OneLake.

Josh Caplan serves as a Principal Group Product Manager at Microsoft, where he's now leading product management for Microsoft OneLake. He has a strong background in managing products like Power BI, SQL Server Analysis Servers, and Azure Analysis Services. Before his current role, Josh contributed to enhancing developer tools for Power BI and worked with Bing to harness its vast data resources.

OneLake is a foundational aspect of all things Fabric.

In the show, Josh provides a solid introduction to OneLake, and then we delved into many aspects of how it works, and what it means for organizations.

You'll find it here:

I hope you find it useful.