Opinion: If you're a DBA and want to retrain, what should you learn?

On our SQL Down Under email list today, someone asked:

My title is DBA but my job is more into SQL Developer, fixing data involved in applications. Do you think if I study Power BI that I can get a better job?

I get asked this sort of question regularly, particularly from traditional DBAs who see their roles disappearing.

The most basic answer is to adapt what you're doing across to roles that are still in need like data modelling, query performance tuning, DB design in general, etc. However, I wanted to make some more broad recommendations for those considering something more radical.

Historical Answer

Over the years, when I've been asked this, I point out that a key advantage of BI is that it tends to appeal to the people who pay the bills.

Awesome image by Sharon McCutcheon

If you work on core business systems like invoicing, order entry, accounting, and so on, you can have a rewarding career. However, you'll be spending your life working in what the organisation sees as a cost of doing business. And that's something that they want to minimise.

Awesome image by Krill Sharkovski

The higher you can move higher up the IT ladder (in terms of value to the business), the better funded your projects usually are, and the more interesting your role is likely to be.

BI Example

A simple example, let's consider a company like Amazon. The people who do all their IT for core order processing, shipping, etc. will have busy and probably interesting jobs. But their life will be full of head count restrictions, budget cuts, and an endless desire to minimise their costs. To get funded, most new projects will need to show that they lead to a reduction in existing costs.

Then consider the people who do the "hey you bought this, I really think you should consider this as well" code.

I can't say for sure, but I'd almost guarantee their projects are funded at an entirely different level, largely because they can directly affect the profitability of the company. New projects in these areas are much more likely to be seen as investments. They will also be likely to attract funding from outside the normal IT chain of command, probably from the Marketing team.

(I've seen predictions years ago that most of IT will eventually report to Marketing).

That advice has worked well for us, and still works now, but it's yesterday's advice, and now I see things differently.

Today's Answer

Tomorrow's corporate battles, and even potentially the survival of the companies will be largely based around their ability to implement AI.

The first generation of AI was all about super-specialists, deep thinking, and looking for breakthroughs.  It was owned by the 7 big corporations working in AI and their association to a handful of universities doing advanced work in the area.

This next phase of AI (following on from the development of deep learning) is all about implementing these current AI concepts, and applying them to so many aspects of business and the community. Even though the biggest wins will come to those who own the big data sets, there are and will continue to be amazing opportunities for commercialising existing AI concepts.

Our most interesting projects today are ones that are based around AI tooling, that's letting us solve business problems that we simply could not have solved any other way, at least not economically.

And if you want an area that's crying out for short to medium term wins, that's security.

Awesome image by Liam Tucker

The problems in this area are now almost already completely out of hand. There are estimates that this year alone, there will be shortages of hundreds of thousands of IT security people, but we're talking about serious security people. And this is going to get much worse.

The only foreseeable way to solve most IT security issues is via AI.

I'm not too worried about retraining now but if I was in my 30's or 40's, aiming directly at the intersection of AI and advanced IT security would seem a really safe bet.

T-SQL 101: #59 Extracting date and time components in SQL Server T-SQL using DATEPART

In an earlier post, I showed how you can use the YEAR, MONTH, or DAY functions to extract those parts of a date. But what if you want to extract any one of those other intervals?

Well, that's what the DATEPART function does. It allows us to specify the interval that we want to extract. Look at the following query examples:

And here are the results of that query:

First, I asked for the day, and this is no different to using the DAY function. I see 28 returned.

Next, I asked for the second, but because I only supplied a date, there were no second, and zero is returned.

Finally, I asked for the second in a datetime value that included seconds, so I get back the value 46 which is the seconds component of the time part of the value.

DATEPART is a useful function.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Faster Power BI and Analysis Services Development with Automatic Data Subsets

If you have worked with larger tables in either Power BI or Analysis Services, you'll know that developing against them is painful. Any time the data needs to be refreshed (which happens often), you spend a lot of time sitting around waiting for it to happen.

Now our buddy Chris Webb proposed a way of getting sampled data during development. That's pretty good. And fellow MVP Marc Lelijveld also uses a similar method.

In both cases, they are using a TOP N inserted to limit the number of rows. But it's never been what I want.

What I Really Want

First, I don't really want a TOP N, as I generally want a specific range of dates. For example, while I might want all data when the model is in production, or I might have a starting date for that, I usually want just a specific subset of the data for development. I often want say, just the last two months.

Second, when I'm working with source control systems, I don't want to be changing the BIM or PBIX files in any way at all, as they move between development and deployment. I don't even want to use different parameters.

Ideally, I wish the development tools like PBI Desktop, Analysis Services Tabular Designer in Visual Studio, etc. automagically included an extra limiting predicate while I'm developing.

My Workaround

In the meantime, I've come up with what I think is a pretty good workaround. I'm making the views that I connect to, determine what to return, based on either the HOST_NAME() or APP_NAME() from the SQL Server connection. Let me show you.

First I'll create two of the schemas that we commonly use:

(Another thing I really wish for: CREATE SCHEMA IF NOT EXISTS or CREATE OR ALTER SCHEMA).

Next I'll create a table to hold a list of the hosts that I'm using for development:

Note: I often get questions about the data type sysname. It's the data type for system objects, and is currently mapped to nvarchar(128).

Then I'll create and populate a table that I'm pretending is part of my data model:

I've added a transaction for every day in the last year.

Next, I'll create the type of analytic view that we often use:

There's a hard cutoff date for loading data (perhaps the start of sensible data) and if I query this, I see all 365 rows.

And the Secret Sauce

The trick is to change the view so that it makes decisions based on working out if I'm in development or not:

If the query isn't coming from a development host, it'll return all data since the same hard-coded start date (i.e. start of 1990). But if I'm on a development host, it'll just return the last two months of data.

I'll add my client to the list of development hosts:

And then query the view again:

And now I see only 62 rows, without changing the code at all.

Back in Power BI or Analysis Services Tabular Designer, if I'm on my client, I see the subset, but on the server, I see all the data without changing the BIM or PBIX file at all.

What if Host Won't Work for You

There might be situations where making the choice based upon the host name just won't work for you.

In that case, I'd suggest checking the APP_NAME() function instead of the HOST_NAME() function, and having a list of apps that get development subsets instead of full data sets.

 

 

Minimizing locking while archiving rows from a main table in SQL Server T-SQL

I've mentioned before that I really don't mind working with large tables. They're only slow to use if you're looking through all the rows when trying to find the data that you want. But there is still sometimes a valid need to archive data from a main table, across into an archive table. And people often run into problems, particularly blocking problems, when they try to do that.

Let's start with an example pair of tables (much simplified for clarity):

The wrong way

What I see most people do when they need to archive the data is something like this:

There are two basic problems with this:

  • If the SELECT picks up more than 5000 rows, then by default, a lock is taken on the whole dbo.Transactions table. (I had a suitable covering index but the same locking issue would apply if there wasn't one) If that happens, there will be issues for other people using the system, including people trying to insert new transactions. So they'll probably only be allowed to do this at weird hours of the day.
  • It's much more subtle, but unless you're certain that the rows being archived cannot change, you aren't 100% sure you're deleting the same rows as you are selecting.

Sometimes they might even wrap the whole thing in a transaction. That might cause them to have both tables tied up in locks.

A better way

Both of these issues can be avoided this way:

Limiting the deletions to 1000 rows will avoid the table locks, as long as there is a suitable index to support the delete.

Using the OUTPUT clause of the DELETE statement makes sure that the rows you delete are the same ones that you archive.

If there's any chance you might have other locking contending with this, you might try using READPAST:

There aren't many places where I use READPAST but this is one of them. READPAST says if you come across a row that's locked, just skip over it instead of blocking.

A final note is that instead of an appropriate non-clustered index like I've used here, if you can just select blocks of data using whatever the clustering key is instead, that would be even better.

 

SDU Tools: Calculate week day of the month in SQL Server T-SQL

The DateDiffNoWeekends tool in our free SDU Tools for developers and DBAs, has been really popular. Another request that we received though, was how to calculate which date a particular numbers week day is. For example, we might want to know which is the 12th week day of the month. So we've added a tool to do that. It's called WeekDayOfMonth.

It's a function that takes three parameters:

@Year is the year (of the month in question)
@Month is month in question (as a number within a calendar year i.e. March = 3)
@WeekdayNumber is an integer that gives you the position in the month that you are trying to calculate.

One nice advantage of this function is that you can supply a negative @WeekdayNumber parameter, and it will start counting from the end of the same month instead.

You can use our tools as a set or as a great example of how to write functions like these.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

T-SQL 101: #58 Subtracting dates and times in SQL Server T-SQL using DATEDIFF

When you're working with T-SQL, it's common to need to work out how far apart two dates and or times are. Look at the query below:

In this case I'm asking how many days is it from 28th of February 2019 to 31st of July 2019? And if look at the answer:

We can see that it's 153 days.

Now I used days here but now we could have used any of those other intervals that I discussed when we looked at DATEADD. So I could work at how many minutes that was or how many seconds that was between the two times. I could even say how many months?

The order of the parameters that you pass to this function are important. The first one is considered the starting date and/or time, and the second is the ending date and/or time. So if I had put those around the other way, I would have seen -153 instead of 153.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL: SQL Server database stuck in single user mode

There was a question on Stack Overflow recently about SQL Server being "stuck" in single user mode.

The poster had a SQL Server 2016 instance, running on Windows Server 2016. He said he tried to do this:

But it always told him that the database is in use.

The fix

The core concept to understand here is that when a database is in single user mode, you can't assume that you're the single user. In this case, he clearly wasn't.

I've seen this happen a lot when people are using SQL Server Management Studio (SSMS) and Object Explorer is connected to the database. It could very well be the single user, not you.

Best advice is to try to find anything that's connected to the database, and close it. Then try again.

If there's no luck, restart the SQL Server service, and connect with a tool like SQLCMD that isn't going to go opening any other connections. Then you should be able to execute the command to put the database back in multi-user mode.

SDU Tools: Set Database Compability For All SQL Server Databases To Maximum

While it's not 100% required, your life is generally better if your databases are set to the maximum database compatibility level that your server supports. But it's a bit of a pain to actually make that change. So in our free SDU Tools for developers and DBAs, we have a procedure that does just that. It's called SetDatabaseCompabilityForAllDatabasesToMaximum.

It's a stored procedure and takes no parameters at all.

It checks all the user databases and sets them to the maximum level allowed by the server that they are attached to. It is applicable to SQL Server only. (This procedure doesn't exist in the Azure SQL Database version of the tools).

You can use our tools as a set or as a great example of how to write functions and procedures like these.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

 

T-SQL 101: #57 Adding and subtracting date and time periods using DATEADD in SQL Server T-SQL

When you're working with T-SQL, you'll often need to add time periods onto a date, perhaps onto today's date. You might be adding (or subtracting) days, or hours, or minutes, or months, and more.

The DATEADD function is the one that we can use add or subtract intervals to the date and time. Now in the example shown here, I've started with 20190228, or 28th February 2019. I want to add on 12 days:

and you can see that the outcome is the 12th March 2019 as expected.

Now there are a wide variety of intervals that can be used, not just days. These are the available intervals:

These range from years right down to nanoseconds. You can also see that there are abbreviations for each of the intervals.

In strongly suggest that you do not use these abbreviations. You don't want a situation where you see say, M, and are wondering if that's a month, or a minute, or a millisecond.

You should always aim for readable code. It's much, much better if you actually use the full word. So even words like nanosecond or millisecond, if you ever work with those, put the complete interval as part of the function instead of using the abbreviated code.

You'll see lots of old programming where people have used the abbreviations; don't follow that, and show them a better way of writing the code!

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SDU Tools: Dates in a Period in SQL Server T-SQL

It's common to need to get a table of all dates between two dates, and in our free SDU Tools for developers and DBAs, we have a function DatesBetween that I've previously talked about. But sometimes it's not a start and end date that you have, it's a start date and a number of intervals. For example, I might want all the dates for the next three months, or three weeks, or two years. To make that easy, we've added a function called DatesInPeriod.

It's a table-valued function that takes three parameters:

@StartDate is the first date to return
@NumberOfIntervals is (no surprise) the number of intervals to include
@IntervalCode is a string that can include YEAR, QUARTER, MONTH, WEEK or DAY.

For each date in the number of intervals, it returns a key (an incrementing number for the position of the date within that period), and the date.

You can use our tools as a set or as a great example of how to write functions like these.

Find out more

You can see it in action in the main image above, and in the video here:

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com