SDU Podcast: Show 72 with guest Power BI General Manager Kamal Hathi

I had the great pleasure to record another SQL Down Under podcast last week with the Power BI general manager Kamal Hathi.

In the show, Kamal and I discuss the current state and potential futures for Power BI, its relationship to SQL Server Reporting Services, and its development and extensibility models.

You'll find the show here: http://www.sqldownunder.com/Podcasts

I hope you enjoy it.

Note: We had a few unexpected audio issues with the recording. Sorry about that. We'll do better next time 🙂 It's still pretty good and I'll still think you'll find it interesting.

Opinion: Mature consultants don't always want to tear down the house

I work with data. I understand that for most organizations, that the data they own is the most valuable asset the company owns.

One thing I've learned from working with data is that unlike application software, data generally outlives generations of applications, is often used by many different applications, and typically just morphs from shape to shape over time. It almost never gets totally refreshed.

This is a good thing.

I've been in the industry long enough to see many types of consultants. One type that I have the least time for, is the type that always wants to tear down or replace whatever is already there at an organization. It's far easier to just say "let's replace it all" than to try to work out what to do.

Many of these consultants don't really understand what's already in place, but because it looks different to what they are used to, it must be wrong, and it must be replaced.

A mature consultant might not like what they see but they take the time to consider what's already there.

The mature consultant is skilled enough to work out how to take the organization from a place that they don't want to be, to a place which is better than where they are now.

That's the real skill.

Opinion: ORMs: Are they a good idea when developing for SQL Server?

Many people know that I have an issue with ORMs (object relational mappers). It's worth spending a blog post to describe why.

Unfortunately, I spend my life on the back end of trying to deal with the messes involved. The following are the key issues that I see:

Potentially horrid performance

image

I've been on the back end of this all the time. There are several reasons. One is that the frameworks generate horrid code to start with, the second is that they are typically quite resistant to improvement, the third is that they tend to encourage processing with far too much data movement.

I keep hearing that "the databases are getting so fast now that this loss of performance doesn't matter". I can tell you that I don't live in a world where that's true.

In the world that I live in, I regularly end up in software houses with major issues that they don't know how to solve.

I was in a large financial in Sydney a while back. They were in the middle of removing the ORM that they'd chosen out of their app because try as they might, they couldn't get anywhere near the required performance numbers.

Why had they called me in?

Because before they wrote off 8 months' work for 240 developers, the management wanted another opinion. They were making the right decision but they wanted someone else to agree with them.

As another example, I was at a start-up software house last year. They had had a team of 10 developers building an application for the last four years. The business owner said that if it would support 1000 concurrent users, they would have a viable business. 5000 would make a good business. 500 they might survive. They had their first serious performance test two weeks before they had to show the investors. It fell over with 9 concurrent users. The management (and in this case the devs too) were panic-stricken.

For yet another recent example, I was in a  software house that had to deliver an app to a government department. They were already 4 weeks overdue and couldn't get it out of UAT. They wanted a silver bullet. That's not the point to then be discussing their architectural decisions, yet the architectural decisions that they'd made were the issue.

I've seen background processing jobs in a large UK-based financial service organisation. On a system with 48 processors, and 1.2 TB of memory, and 7 x 1 million UK pound 20TB flash drive arrays, one task. During that time, it issued 550 million SQL batches to be processed and almost nothing else would work well on the machine at the same time. The replacement job that we wrote in T-SQL issued 40,000 SQL batches and ran in just over 2 minutes. I think I can get that to 1/10 of that with further work. Guess which version is the right answer? What had caused this? Re-use of entity-level code through a custom ORM.

Minimal savings yet long term pain

image

Many of the ORMs give you an initial boost to "getting something done". But at what cost? At best, on most projects that I see, it might save 10% of the original development time, on the first project. But as David Connor pointed out in his excellent TechEd talk with Joel Pobar (and as I've seen from so many other places), the initial development cost of a project is usually only around 10% of the overall development cost.

So what are we talking about? Perhaps 1% of the whole project? Putting yourself into a long-term restrictive straightjacket situation for the sake of a 1% saving is a big, big call. The problem is that it's being decided by someone who isn't looking at the lifetime cost, and often 90% of that lifetime cost comes out of someone else's bucket.

Getting stuck in how it works

image

For years, code generated by tools like Linq to SQL was very poor. And this was a tool designed fully for SQL Server. Now imagine what the code that's generated by a tool that doesn't even know which backend database it's using looks like.

That's where Entity Framework (EF) started. Very poor choices are often made in the design of these tools. The whole reason that "optimize for ad hoc workloads" was added to SQL Server was to deal with the mess from the plan cache pollution caused by these tools.

A simple example is that on the SqlCommand object, they called AddWithValue() to add parameters to the parameters collection. When I used to teach ADO.NET, I'd explain why using that method call was a really bad idea. It provides the name of the parameter and the value, but no data type. So it tried to derive the data type from the data. SQL Server would quickly end up with a separate query plan for every combination of every length of string for every parameter. This is plan cache pollution.

And what could the developers do to fix it?

image

Nothing. Nada. Nil. NULL.

Because it was baked into how the framework worked. The framework eventually got changed a bit to have more generic sizes but still never addressed the actual issues.

Getting stuck with old code

image

SQL Server 2008 added a bunch of new data types. Spatial was one of my favourites. When did that get added to EF? Many, many, many years later. What could the developers do about it? Almost nothing except very hacky workarounds.

When you use a framework like this, you are totally at the mercy of what its developers feel is important, and that's if they haven't already lost interest in it yet.

When you code with a tool that was targeted cross-DB, you usually end up with a very poor choice of data types. You end up with lowest common denominator of everything, or you end up with something that doesn't fit well with the framework.

Many of the authors of the frameworks quickly lose interest. Do you want to be stuck with it?

Boilerplate code

image

"Oh I don't want to write all that boilerplate code" I hear people say.

Sure. I get that but are you incapable of code-generation? At least give yourself control over how the code works. And that's mostly a one-time cost. You get to use it again, project after project. But at least if you don't like it, you can fix it.

Summary

Look, I could go on for days about this stuff and the astonishing messes that I've seen.

You need to decide what you're building. But at least consider where you are taking your advice from.

image

If you're taking architectural advice from someone who hasn't built real systems at scale, at least get further opinions.

If you are building toy apps ie: a replacement for CardFile, none of this matters but I live in a world where it really, really matters. And so do the vast majority of software houses that I spend time in.

Future

Also worth noting that if you want real performance from upcoming DB engines, you might want to rethink the idea of just using the database an an object repository. Even NOSQL options like CosmosDB (was DocumentDB) have a concept of a stored procedure within the engine. You write them in Javascript. But there's a reason they are there.

In SQL Server 2016, this is even more stark. I have simple examples on SQL Server 2016 where a batch of code sent from an app executes in 26 seconds. If I move the table to in-memory, it runs in 25.5 seconds. If I move the code to a natively-compiled stored procedure, it runs in 60 milliseconds.

SQL: Linked Servers: Don't hard code server names

I'm not a great fan of linked servers in SQL Server but they are often necessary. I really wish the on-premises product supported External Data Sources and External Tables. But in the meantime, what I see all the time, is people hardcoding server names like this:

SDUPROD2016.WWIDB.Payroll.Employees

That makes your code really hard to manage. One option to get around that is to use synonyms.

Instead of sprinkling references to that table all through the code, you can create a synonym for it like this:

image

I created a local schema to match the remote one, and then created a synonym for the remote table. That way, the code can just reference Payroll.Employees instead of the four part name.

One aspect of this that I didn't like is that if the server name or database name changes, that I needed to recreate all the synonyms, and I might have a bunch of them.

What fellow MVPs Simon Sabin and Erland Sommarskog pointed out to me today, is that you can use an alias name when creating a linked server. I'd always known you could do that for non-SQL Server linked servers, but didn't realize that the underlying stored procedure would allow it for SQL Server linked servers too.

Notice that when you first go to create a linked server in SQL Server Management Studio, you can choose the name to be different to all the other properties (which include the server name):

image

But as soon as you choose the server type as SQL Server, that option can't be used as the single textbox is used for both the name of the actual server and the alias for the linked server.

image

The underlying stored procedure sp_addlinkedserver quite happily lets you assign a different name.

This is important because it means that I can create a linked server called HRServer for my actual server SDUPROD2016, and use that name when creating the synonyms instead:

image

Then if I need to move the database to another server, I can just change the linked server definition, instead of all the synonyms that reference it.

That's a great step forward but I still have to change them if the database name changes. And synonyms still don't give me tooling support like Intellisense either.

So I'll keep pushing to get External Tables and External Data Sources Smile

Opinion: Don't Play Hide and Seek with T-SQL

I spend most of my life in a variety of customer sites. Most of these nowadays tend to be large financial organizations. When I talk to developers in those organizations, and ask them about how they interact with their DBAs, I can't tell you how often I hear that they try to avoid ever making schema changes, etc. as it takes too long and the process is painful.

When I talk to the DBAs at the same organizations, and I ask them why they resist making schema changes, I hear this:

If they change the database, they believe someone will scream at them, but they don't know who because:

  • They have no visibility on the code that touches the database so
  • They do not know what they are about to break.

So where is this code that the DBAs can't see or test?

  • Embedded in applications (or generated within the application dynamically) – often in widely-differing technology stacks (.NET, Java, etc.)
  • Embedded in Excel spreadsheets
  • Embedded in Reporting Services Reports
  • Embedded in Power BI Dashboards
  • Embedded in Analysis Services data sources
  • Embedded in SSIS Packages
  • Embedded in Access databases
  • And so on and so on.

What this means is that if the DBAs want to change the database, they have absolutely no way of knowing, at all, what they are about to break, unless they have tested every spreadsheet, have run every option in every report, have run every combination of SSIS packages, have reprocessed every analytic data model, have used every feature in every application, etc, etc.

That's never going to happen, so what do they do? They push back on any changes.

image

If I talk to the developers, and ask them why they built this pile of cards, I'm invariably told that they had to be agile. Yet what they don't understand is that what they have built is the complete opposite of agile. They have actually built a house of cards where no-one is game to touch anything. And we know where that all ends.

image

This is why I'm so much of a fan of having at least one layer of abstraction (yes that's procs and views) within the database. Then at least, the DBAs have visibility in the code touching the database.

In a slightly better world, the data people would also have unit tests wrapped around those procs and views. Then, when they need to make a change, they could just re-run the tests, and at least know that they haven't broken every report, spreadsheet, etc. in the organization.

I know that many might disagree, but I believe that if you are putting anything that looks like T-SQL logic directly into an application, into a spreadsheet, Reporting Services report, Analysis Services data source, Access database, etc., you're building a problem faster than you're building your applications.