I'm pleased to let you know that version 20 of our free SDU Tools for developers and DBAs is now released. It's all SQL Server and T-SQL goodness.
If you haven't been using SDU Tools yet, I'd suggest downloading them and taking a look. At the very least, it can help when you're trying to work out how to code something in T-SQL. You'll find them here:
Along with the normal updates to SQL Server versions and builds, we've added the following new functions:
WeekdayOfSameWeek – this one was requested by Dave Dustin and returns a nominated day of the week for a given date. For example, it will let you find the Thursday of the week that 23 Oct 2020 falls in.
NearestWeekday – this one was requested by Martin Catherall and returns a nominated day that's closest to a given date. For example, it will let you find the closest Thursday to 29 Oct 2020. (rather than one in the same week as above)
CobolCase – we've had so many case options, I'm surprised we'd not had this one yet. It returns upper case words with dashes in-between i.e. COBOL-CASE-IS-COOL
DateOfOrthodoxEaster – we previously had a function to let you find the date of the next Easter but that was the Christian Easter. Now we've adapted a sample sent to use by Antonios Chatzipavlis to calculate the date of the Orthodox Easter for a given year.
IsLockPagesInMemoryEnabled – this one does just what it says. It determines whether or not lock pages in memory is enabled.
There are a few enhancements as well:
We've done the usual updates for SQL Server builds and versions (up to date as at the point of release).
DateDimensionPeriodColumns now has a wealth of new columns including IsStartOfMonth, IsEndOfMonth,
IsStartOfCalendarYear, IsEndOfCalendarYear, IsStartOfFiscalYear,
IsEndOfFiscalYear and IsStartOf and IsEndOf for all Quarters both Calendar and Fiscal
DateDimensionColumns now has quarters and start and end of month.
I was reading posts in an email distribution list yesterday and someone asked if SQL Server Replication was deprecated. First up, let's just confirm that it's not. As Mark Twain said: "The reports of my death are greatly exaggerated". There's still a really strong need for it, and somewhat ironically, that need has been getting stronger lately.
Back when replication was introduced, it had a bit of a reputation for being hard to set up, and hard to look after. It was in the same category as features like clustering. If I was teaching a SQL Server 6.5 class, you could tell which students could follow instructions if they managed to get clustering working. Fortunately, it's nothing like that today but you'll still hear from people with "old pain".
Styles of Replication
SQL Server Replication is a general term, and it covers a number of features. That's part of the reason that you get mixed messages.
I don't use Merge Replication and I avoid it at all costs.
There, I've said it.
Merge works by adding triggers to tables and capturing the changes, then periodically synching up with other machines and trying to apply each other's changes. And no big surprise, conflicts are an issue. There are ways to resolve them, but they mostly involve overwriting someone else's changes. Even though one database is the publisher, each system involved is pretty independent. It's very similar in many ways to the form of replication that older Access systems used.
Where Merge falls down is in performance. It can be a real challenge, particularly if you have a complex topology. It's easy to end up with what we call "merge storms" where systems are just endlessly moving data around. It works ok for small numbers of systems and small volumes of changes, particularly if they aren't likely to conflict with each other.
If ever there was a word that is overloaded in SQL Server, it's snapshot. (OK, DAC might be close to that claim). In the case of replication though, the basic idea is that you just periodically copy over a full copy of the data.
This is simple and reliable. Where it works well is when you have small amounts of data and it's not worth tracking the changes, or if you have a relatively small amount of data and sending all the data is less work than sending the changes, because it's endlessly being modified.
It's an excellent option for pushing out reference data from a central database to many other databases.
This is the most common form of replication today. The transaction log of the publishing database is read (by a log reader process), the interesting stuff is extracted from the log and sent to a location where it's distributed. From there, it can be pushed to subscribers, or pulled by subscribers.
There's a less-commonly used variant of it called Peer to Peer Transactional Replication. We'd rarely use that option.
As for "standard" transactional replication, nowadays it's pretty easy to set up, and it's not that hard to look after. The tooling has improved markedly over the years.
But why would you use it instead of other options like readable secondaries on availability groups? Well, for lots of reasons. Here are my main reasons:
On-Premises SQL Server to Azure SQL Database for BI
A few years back, we got the ability to replicate data from on-premises SQL Server databases into Azure SQL Database. (Note: I'm not talking about just Azure SQL Managed Instances or SQL Server in Azure VMs, but real Azure SQL Database). This is a wonderful option and is causing a resurgence in the use of transactional replication.
We have many customer sites where they have on-premises SQL Server systems and this lets them push a copy of the data, quite up to date, into Azure SQL Database. That's used to either centralize the data, or to then perform analytics on it.
Let me give you an example: I've recently been dealing with a company that has locations across the country, all of which are running their own on-premises SQL Server based systems. We're able to add replicas in Azure SQL Database, and now we can access all the data from the source systems directly in Azure.
No need for the days where you'd be setting up VPNs from sites to the head office to get this sort of data. And if any site's link is down (including the head office), the others keep working. When the site comes back, the data just continues to flow.
And it's a really great source of data for Azure Analysis Services, and/or Power BI.
Subset of Data Required for Analytics
A fellow MCM commented to me recently that if you needed to move 100 tables totalling 2GB for analytics, and the whole database was 4TB and had 30,000 tables, an availability group replica would be entirely the wrong choice.
With transactional replication, I can replicate just the required tables. In fact, I can replicate just the required columns and rows.
When I have data replicated to another server, I can then use a different indexing strategy on that target server. This means that if I need indexes to run analytic queries and those indexes aren't needed on the source system (or might hinder it), I can just add them on the target. You can't do that with an availability group replica. You'd have to put the indexes on the primary database.
This is especially useful for 3rd party application databases where they'll get all upset if you start adding indexes to their database. You still need to check if they're ok with you replicating some of the tables in their database, but we've had good outcomes with this.
Schema and Data Type Differences
While we generally don't do this, there's nothing to stop you having different table definitions on the target systems. INSERT, UPDATE, and DELETE commands flow to the target and you can choose how they are applied. You can even write your own stored procedures to apply them in whatever way you want.
Large Numbers of Secondaries and No Domain Join Needed
With replication, you can have a large number of secondaries if all you're trying to do is to distribute data so that it's local to other systems.
And while availability groups have basic options for creation outside of domains, transactional replication works just fine in these situations.
You do need to learn how to work with, and manage, replication. It's easier today than it's ever been but most people that I see having issues with it, just haven't really understood something about how it works.
Given how useful it now is with the Azure SQL Database options, there's never been a better time to learn about it.
If you want a quick way to learn, we have an online course that will probably take you a day or two to complete. You'll find it here:
I've been in so many companies lately where new CTOs and CIOs claim to have a cloud focus, but all they want to do is migrate all their existing systems from on-premises VMs to cloud-based VMs. They talk about making a cloud transformation but they're not transforming anything.
I was pleased to get a chance to create a short educational series for the people at PASS to cover some of my thoughts on how to make a real transformation, not just a migration.
It's unfortunate that the SELECT query in SQL isn't written in the order that operations logically occur (if not physically). I suspect that's one of the things that makes learning SQL a bit harder than it needs to be.
Without getting into really complex queries, you need to understand the logical order of the operations.
The starting point is to determine where the data is coming from. This is normally a table but it could be other sets of rows like views or table expressions.
This clause is the initial filtering of the data that was retrieved in the FROM clause. This is all about finding the rows that need to be considered in the query.
This clause is about summarizing or aggregating the rows. We determine the columns that will be used for grouping. The SELECT column will normally later have these grouping columns so that you can make sense of the output. The SELECT might also have other columns but only if they are aggregated via MIN, MAX, AVG, etc.
Newcomers to SQL often get confused about the purpose of the HAVING clause, particularly when comparing it to the WHERE clause. The HAVING clause is a filter applied to groups of data that have come from the GROUOP BY clause, where the WHERE clause is filtering rows of data.
The SELECT clause is about deciding which columns or aggregates need to be returned. Formally it's often called a projection of the data. Columns or aggregates can also be aliased at this point (i.e. given another name). If you notice how late in the logical order the SELECT clause is, you might realize that this is why an alias can be used later in an ORDER BY but not in any of the previous clauses.
SQL queries don't have any default order. If you want data returned in a specific order, then you need to specify the columns or aggregates that you want to order the output by.
It's really important you start to have this picture in your head of the order that queries are logically sequenced.
I saw the above message when working with one of my clients today. The error says Only users with Power BI Pro licenses can publish to this workspace. And that would make sense if they hadn't already purchases Power BI Pro licenses for the user.
I checked online, and there were a number of comments about people seeing this error. There were the usual suggestions of logging out and back into Power BI. There was even one who'd quoted a Microsoft support person who said that Pro licenses don't work for up to 24 hours after you purchase them. (That sounds dubious to me).
We checked that the license had been assigned to the user. I was concerned that perhaps the licenses were purchased but not actually assigned to the users. All was good with the license assignment.
But then we noticed something odd:
Note that the user still had their Free Power BI license assigned to them, and for some unknown (and not very sensible) reason, that was the one that the system was using when they logged in.
We removed the free license assignment and all was then good.
Many years ago, I spent a lot of time in universities. I ended up finishing my studies at QUT in Brisbane
and I have a great and continuing fondness for that institution. Earlier on though, amongst other universities, I did quite a lot of study through Charles Sturt University
Over the years, I've maintained a continuous link with my friends at Charles Sturt University (CSU). Back when we used to run Code Camps for both developers and DBAs, CSU were only too pleased to jump in to help us. From the minute we arrived that first day in Wagga Wagga, I knew it was going to be good. Associate Profession Irfan Altas was an amazing help and remains a friend to this day. I'm always pleased to get to chat to him.
Irfan asked me to be the guest speaker at a CSU graduation a few years back. That was a great honour.
What many people might not realise is that I've also been helping as an industry supervisor for students in PhD or Doctor of IT programs.
And that's the reason for this post. I was so excited yesterday to hear that one of the students that I've been supervising (in this case together with Professor Oliver Burmeister) has completed all the requirements to be admitted to his Doctor of IT degree.
Congratulations to Dr Georg Thomas ! That's a major life achievement for you.
Power BI is amazing. And it's starting to appear all over the place. Many enterprises don't know what to make of it though. Some are scared that it'll be the "next Access" where stores of uncontrolled data end up all over the organization. Power BI's mantra of "5 minutes to Wow" is spot on. It's easy to be impressed. But enterprises are often struggling with "what comes next after that 5 minutes?"
We've been implementing many enterprise-level projects that use Power BI, and I'd love to spend time showing you what we do.
Amit Bansal and the DPS team in India are hosting a virtual summit later this year. There are three days of data-related conference presentations delivered 24 x 7. If you can't find something of real interest to you, and in your time zone, you aren't looking. And the price? At present it's $89 USD. Yes you read that right. And it includes a year of access to the recordings of all the normal conference sessions.
I'm presenting a pre-con over two half days. At the current discount, it's $129 USD and if you use the discount code GREG@DPS, it's even cheaper. I'd love to see many of you attend. Let's talk about Power BI in the enterprise.
Now, this isn't a session about how to make great visuals, etc. I'll leave that to someone else. But if you want to really get into what makes a good underlying data model, how to integrate properly with security, how to structure projects, how we use Azure Analysis Services and/or Power BI Premium and more, this should be the right session for you.
I was editing using the Power Query editor in an Analysis Services project, hosted in Visual Studio 2019. When I tried to use "Add column by example", I received the error shown above:
Could not load file or assembly 'Microsoft.DataIntegration.TransformDataByExample, Version=18.104.22.168, Culture=neutral, PublicKeyToken=31bf3856ad34e35' or one of its dependencies. The system cannot find the file specified.
The problem is that the assembly had not been deployed with the Analysis Services Designer.
Fortunately, the same assembly is used by Power BI Desktop. Because I had that installed on the same machine, I was able to copy the assembly: