SQL Down Under show 90 with guest Joe Sack discussing Azure SQL Database and Copilot is now published!

Joe Sack is an old friend. (I've known him a long time. He's not actually old). He's always fascinating to chat to and so I was so pleased to have him on another SQL Down Under podcast today. Last time was in 2017.

Joe is a Senior Product Manager at Microsoft and is working with the integration of data and AI with SQL Copilots. The aim is to use AI and natural language processing to make database management simpler, whether you've been doing it for years or just getting started. The experiences he discusses in the show include Azure Copilot integration and Natural language to SQL.

I hope you enjoy it. You'll find this show (and previous shows) here: https://podcast.sqldownunder.com/

Cosmos Down Under show 11 with guest Khelan Modi discussing vector database and search is released

It's been a big week for Down Under podcasts. I really enjoyed recording another new Cosmos Down Under podcast this morning. It's now edited and released.

Show 11 features Azure Cosmos DB product manager Khelan Modi discussing the vector database and search features of Azure Cosmos DB, and particularly how that applies to large language models (LLMs).

Khelan is a product manager on the Azure Cosmos DB team. He leads the AI and Portal (UI) initiatives for the Vector Database service, Azure Cosmos DB for MongoDB vCore.

He's also responsible for the go to market strategies and the growth of the product.

I saw Khelan talking about this recently with Mark Brown, and I thought it would be great to have him on the show.

I hope you enjoy it.

You'll find it here, along with previous episodes: https://cosmosdownunder.com

Fix: Getting R and Python to actually work on SQL Server 2022

One of my more surprisingly popular blog posts in the past talked about Machine Learning: Testing your installation of R and Python in SQL Server 2017. The problem is that SQL Server 2022 changed things.

Now the SQL Server team has published an article on how to make it work.  What I've been finding though, is that people are struggling to follow that article. And more concerning, people using named instances of SQL Server (like SQLEXPRESS) couldn't get it to work at all, no matter how much they tried.

It took me a while to work out what was needed, so I figured it was time to share it with others. I've also submitted a few pull requests to get that page updated. So, by the time you read this, it might already be updated.

What you need to do for R and Python on SQL Server 2022

You need to start by installing SQL Server and making sure you choose at least BOTH the Database Engine Services and the Maching Learning Services and Language.

Unlike earlier versions, the R and Python libraries are not installed or configured for you. If you try to execute R code at this point, even after configuring SQL Server, you will see this type of error:

The error for Python is similar:

I've included both in case someone is searching for this. At least it tries to link to a solution, even though the link is still a docs link not a Learn link. (I'll put in a pull request for that too)

Getting R to work

Let's start by getting R to work.

You need to download and install the latest R 4.2. I got this from https://cran.r-project.org/bin/windows/base/old/4.2.3/ and I installed it with all default options.

R download

You then need to install some standard R dependencies. You do this by running RTerm. I found it here:

Location of RTerm

I right-clicked and ran as administrator, just in case. Then you execute this code step one row at a time, making sure that each installs ok. Note that on the first one you install, you'll probably be prompted for a CRAN mirror site. Pick one near you.

Then there are some Microsoft packages that need to be installed.  Execute these statements one at a time, making sure that each works. You can ignore warnings at this point. (And note this is two lines even though they show line-wrapped here)

Now the first challenge

The next step is where things went wrong. You need to find a program called RegisterRext.exe. For me, it was in this folder: C:\Program Files\R\R-4.2.3\library\RevoScaleR\rxLibs\x64

RegisterRext location

You open a command prompt (cmd.exe) as an administrator. Then you need to configure the instance of SQL Server with the version of R. The article says you do this by executing:

That works fine if you have a default instance of SQL Server and if it's called MSSQLSERVER (the default name). Not so much if you have a named instance. I was trying this on a machine with a SQLEXPRESS named instance as I'd heard this was an issue.

The article didn't give you any idea or example for a named instance. So I tried all the things I normally would for a named instance, including not knowing if I needed to escape the backslash:


and so on. I was about to give up when I finally just tried "SQLEXPRESS". And that worked ! I understand why people are confused. Nowhere else in SQL Server land do you refer to a named instance that way.

So the command needed in that case was just:

Configure and test R in SQL Server

Then it's time to test the SQL Server end.  You need to start by executing:

And you then need to restart the SQL Server service.

Next you can see if it's working. Execute the following in SQL Server:

That should show you 1 as a response.

But no cigar yet

All good so far. So, then I tried an actual R query, as suggested in the article:

Well, I didn't get a great response with that. It should have worked at that point but instead, after quite a while, it returned this:

OK, that looked a bit scary. I started spending ages trying to work out what on earth was wrong with sqlsatellite.dll.

And the weird thing? After a while, and a few more attempts, it just started working. No clue as to why it didn't just work the first time. But I've now seen that on two machines, so it's no coincidence. Be patient. Eventually it returned:

More complex R query

I then was able to execute something more complex:

And R is working

And then R is working !


Then it was time to see how I went with Python. Once again, not as smooth sailing as I'd hoped.

Again, you need to download Python installer. But the link for Python 3.1 that's included in the article goes to a page that now tells you that it's too old to have binaries available. (For Python apparently that means less than a year old). I started to see if it would work with Python 3.11 (the earliest with binaries) but eventually scared myself off, as I knew that wasn't going to be good.

I searched for "Python 3.10 for Windows" and ended up with a link at the same site, and it did have binaries. That link was:


I downloaded it, but importantly, I chose a custom install. (Very easy to miss on the first page). The main thing you need to change is the install folder. It will want to install it in one of your private folders by default, but instead, choose C:\Program Files\Python310 as the install folder. Near the end of the install I also chose hte option to Disable the path length limit. I doubt that's needed here but I've run into it before so I decided to choose it.

Next in an admin command window (same one as before if it's still open), you need to step through these three commands:

That took a little while but was all good.

Another challenge

You then need to set some permissions on folders. The instructions say to execute this command:

Again though, that assumes that the service that's running your SQL Server Launchpad service is "NT Service\MSSQLLAUNCHPAD".

Mine wasn't.

You can find what it is (if you don't remember from when you installed it), by opening (just Windows start/run) services.msc and looking for the service.

services.msc output

Being a standard SQLEXPRESS installation, mine was actually "NT Service\MSSQLLaunchpad$SQLEXPRESS", so instead of the above command, I needed to execute:

That then worked ok. We also need to set one more permission (and it's unrelated to your instances):

We're not done yet

At this point, we're back to needing to use a different version of RegisterRext.exe to configure the SQL Server instance for the version of Python.

The instructions say to execute:

Once again, that's not going to work for my SQLEXPRESS named instance, so I needed to execute this instead:

Now it's time to test Python in SQL Server

Because you've already enabled external scripts in SQL Server, if you've previously installed R, you don't need to enable them again.

Instead, I tested Python:

And that worked:

Simple Python test

As did a more complex query:

More complex Python query

And we're finally done !

I hope that helps someone else get started. (I've also pushed pull requests to the Learn site to get some of this updated. Hopefully it will be by the time you read this).




DevOps: Is AIOps just yet another almost meaningless acronym?

DevOps has quickly become a core part of how many organizations deliver IT, and in particular, how they deliver applications. But just as quickly as it has become popular, a whole series of XXXOps names have appeared. One of the latest is AIOps. So is it just yet another almost meaningless acronym?

Well as Betteridges Law of Headlines suggests, the answer is no.

When I first saw the term, I was presuming this would be about how to deploy AI based systems, and I wondered why on earth that would need a special name. But that's not what it is.

So what is AIOps?

AIOps is the use of artificial intelligence (AI) and machine learning (ML) techniques to allow us to analyze IT problems that occur so that we can respond to them fast enough to be useful.

The core problem is that we're now generating an enormous volume of metric and log data about every step of all our processes, and about the health of all our systems and applications yet so much of that data is never processed, or at least not processed fast enough to be useful.

Only machines can do this.

The term AIOps seems to have been coined by Will Cappelli (a former analyst with Gartner). In the end, humans won't be scouring logs and responding to what they find. Instead, they'll be teaching the machines what to look for, and how to correlate information from a variety of sources to find what is really going on.

Cappelli is now at Moogsoft and sums up AIOps quite distinctly:

AIOps is the application of artificial intelligence for IT operations. It is the future of ITOps, combining algorithmic and human intelligence to provide full visibility into the state and performance of the IT systems that businesses rely on.

People are already doing this but it's likely in the future that this will become a well-known job role. It will be important to guide the machine's learning to teach it to recognize the appropriate patterns.

If you are working in related IT roles, it might be time to start to add some data science, AI, and/or ML into your learning plans.

AI: Machine Learning and AI – What's in a name?

I regularly hear the terms AI and Machine Learning used almost interchangeably, along with a variety of other related terms. I thought it would be useful to add a post that defines some of the common terms and how they differ:

Artificial intelligence (AI) is a fairly generic term. It relates to all intelligent agents that are able to be aware of their environments (in some way), and to take actions where the aim is to achieve a specified goal. Sometimes these goals are terminal ie: they reach a final desired state. Other times, these goals are continuous ie: keep speed at a desired value. It is considered "artificial" intelligence as to an observer, it mimics cognitive functions that humans would imagine other humans performing.

Machine Learning (ML) describes a form of "learning" where a system  improves its model of a specific behavior (ie: "learns"). It can then use the model to predict future outcomes. Machine Learning is considered a field of Artificial Intelligence. There are many types of Machine Learning.

The most common form of Machine Learning today is Data Mining where the model is trained by analyzing existing outcomes, and then used to predict future outcomes. (This is usually called Predictive Analytics).

The learning can be supervised (ie: here are pictures of dogs, is this other picture a dog?), unsupervised (ie: what are the common types of an object?), or combinations of the two (often called semi-supervised).

Deep Learning is a form of Machine Learning where the models comprise many layers. "Deep" refers to the number of layers, not to any specific ability or insight. These models often do an amazing job, and in some cases are already performing better than humans at specific tasks such as speech to text translation.

Reinforcement Learning is another form of Machine Learning that typically involves working out optimal ways for software agents to operate within defined software environments. Game theory, simulation experiments, etc. often form part of Reinforcement Learning. One common way to represent the environments is as what is known as a Markov Decision Process (a mathematical framework that defines the rules for decision making and the goals and rewards involved).





AI: Detecting and Avoiding Customer Churn is Critical

I've flown a lot over the years. What continues to strike me though, is how poorly airlines use machine learning and AI, even when they are in strong competitive environments. A key indicator is detecting and avoiding customer churn. Let me give you an example:

We flew with QANTAS and with their partners in One World for many years. We were both platinum and I'd been platinum for many years. At a recent peak a few years ago, we were flying once or twice a week. That's not a crazy amount, but it's enough. And it's certainly enough to be able to see a purchasing pattern.

We finally got pretty fed up some years back, and one February, we said "enough", and stopped flying with them.

But what fascinated me was how the airline reacted to that.

At the end of the year, I got an email pointing out they'd noticed that I didn't fly as much that year, but because of previous custom, they'd keep my status in place.

After another whole year of not flying, I got another email saying they'd noticed a drop in my custom, and that they'd have to drop me down to Gold. Same thing again the next year.

But at no stage did they ever seek to work out what went wrong.

When could they have actually detected the change? Probably a month or two after we stopped. At that point, there's always a chance you can recover the situation. Every business teacher will tell you how much harder it is to gain a customer than to avoid losing them in the first place, and how very much harder it is to regain a lost customer.

Look at your own businesses, and ask yourself if you have systems in place to detect changes in your customers' behaviors, particularly if they've stopped dealing with you. Don't just detect total volumes over each year. Look for changes in behavior.



Opinion: What's with the lack of coding standards in Data Science?

I've been spending a lot of time over the last few years working through data science and AI topics. One thing that's struck me consistently is the total lack of reasonable coding standards in almost all the sample code that I see.

I was doing an AI lab in eDX recently, and one of the questions got me to open some sample Python code for a virtual environment, and asked me to work out how the virtual world that it created operated.

After working on it for quite a while, I realized that the #1 reason I was finding it hard, was not because the concepts were crazy difficult, it was because the person writing the sample thought it was reasonable to have variables, arrays, etc. with names like r, x, np, d, and so on.

What's with that?

Suddenly it felt like I was reading code written by a self-taught programmer in 1970, at their first attempt at using Basic. There is absolutely no need for anyone to be doing this.

Please don't.

I was left wondering who on earth would write this and interestingly enough, found that the person who translated the environments was in fact self-taught. I admire his efforts in teaching himself but this is not acceptable code to be sharing with anyone else.

There is no reason for data science or AI code in Python, R, or whatever language to be written like this. (Yet I see it all the time)

ML: Properties of the LaunchPad service – changing the concurrent session limit

In a recent post, I discussed issues I found when testing the Machine Learning Services setup in SQL Server 2017.

After that, my old friend Niels Berglund also posted about issues he found after installing CU7 (cumulative update 7) and how he solved them. Niels' article is here: http://www.nielsberglund.com/2018/06/01/sql-server-machine-learning-services-and-sql-server-2017-cu-7/

What each of those articles discussed though was detail on how temporary files are used by Machine Learning Services in SQL Server to hold R or Python data for sessions. By default, SQL Server configures itself to hold data for up to 20 concurrent sessions.

You can change that number.

In SQL Server Configuration Manager, find the LaunchPad service:

In the Properties page for the service, select the Advanced tab. In the main image above, you can see the value that you can change to increase the number of concurrent sessions.

Values up to 100 are permitted.

SQL Server also creates passwords for these "external" users. If you have a policy that requires regular password changes, you'll see that there's also an option in this window to let you change all of them.

(It's worth noting that current NIST guidelines say that you shouldn't force regular password updates anyway – you'll find more info here: https://auth0.com/blog/dont-pass-on-the-new-nist-password-guidelines/)




AI and ML: Why have machine learning in SQL Server at all?

In a post the other day, I described how to test if machine learning with R and/or Python was set up correctly within SQL Server 2017.

One of the comments on that post, said that the info was useful but they were still to be convinced why you'd want to have machine learning in the database in the first place.

Fair question.

I see several reasons for this. SQL Server Machine Learning Services is the result of embedding a predictive analytics and data science engine within SQL Server. Consider what happens in most data science groups today, where this type of approach isn't used.

I routinely see data scientists working with large amounts of data in generic data stores. This might mean that they have data in stores like Hadoop/HDInsight or Azure Data Lake Store but in many cases, I just see operating system files, often even just CSV files. Both the R and Python languages make it really easy to create data frames from these types of files. But where did this data come from? In some cases, it will have come from the generic data store, but in most cases that I see, it has come from within a database somewhere.

And that raises a number of questions:

  • What effort is required to extract that data (particularly for large volumes)?
  • How up to date is the data?
  • What is the security context for that data?

Often the answers to these questions aren't great. What I see is data science people extracting data from existing databases into CSV files, and then loading them up and processing them in tools like RStudio. And mostly, I see that data being processed single-threaded in those tools.

The outcome of this work though, is either analytics or (more commonly), trained predictive models.

Having Machine Learning in SQL Server helps here in several ways. First, you can utilize the same security model that you're using for any other access to that same data. Second, as the data volumes grow, you aren't needing to move (and then refresh) the data. You can process it right where it is. Third, you can take advantage of the multi-threaded architecture of SQL Server.

With Operational Analytics in SQL Server 2016 and later (basically non-clustered columnstore indexes with delayed aggregation, built over transactional data), you might even be able to have the outcomes really up to date.

While being able to train and retrain predictive models is really important, and is hard work, it's when you use those models to create predictions that the real value becomes apparent. Trained models are quite lightweight execution-wise. You can add predictions right into your queries along with your other returned data, and very efficiently. This is where having Machine Learning within the database engine truly shines.

And you don't necessarily even need to create the predictive models. The SQL Server team have provided a series of world-class pretrained models that you can load directly into and bind to, an instance of SQL Server.



Machine Learning: Testing your installation of R and Python in SQL Server 2017

One of the wonderful additions to SQL Server in 2016 was the R language. In SQL Server 2017, Python was also added and the combination of both with SQL Server rebranded to Machine Learning Services.

Why would you want these installed? The most common answer is to enable you to run predictive analytics.

But I've found that at many sites, getting R and/or Python installed turned out to be more complicated than it seemed.

NOTE: If you are using SQL Server 2022 or later, see this article instead: https://blog.greglow.com/2024/03/04/fix-getting-r-and-python-to-actually-work-on-sql-server-2022/

Once you have the features added (the in-database options and not the standalone options) for R and Python, you need to enable the execution of external scripts. That's easy enough:

You need to restart the SQL Server service after doing this.

Now you can try to execute this script to see if the features are working:

If all is good, you'll see that it worked. Both values would be returned.

If you haven't changed the default SQL Server configuration though, I don't think that's what you'll see. More likely, you'll see this:

Msg 39012, Level 16, State 1, Line 4
Unable to communicate with the runtime for 'R' script. Please check the requirements of 'R' runtime.

STDERR message(s) from external script:
Fatal error: cannot create 'R_TempDir'

So, why does that happen? It's SQL Server's attempt at telling you that R doesn't like paths with spaces in them, and that's what the default configuration has. (I have no idea why).

If you open Notepad (or your favorite editor) as an administrator, navigate to this file:

The file rlauncher.config holds the configuration for the R feature.

That path for the working directory isn't going to work. Now you can change the values in it to 8.3 filenames like they have for RHOME in the first line, or, my preference, point it to a different temp folder.

I then make sure that C:\Temp actually exists, and that there's an ExtensibilityData folder under that. I then copy in all the folders from the original folder:

These are used for different processes running from the SQL Launch Pad.

Then restart both the SQL Server service and the SQL Launch Pad service and try your script again.

If you still have no luck, chances are high that the security isn't correct for the folders. Ensure that the SQL Launch Pad service account has full control on the ExtensibilityData folder and all sub-folders. By default, the service account will be MSSQLLaunchPad but you can check it in SQL Server Configuration Manager.

I'd restart both services again just for good luck, and then hopefully you'll see a response from both the queries.

Then you're ready to start investigating Machine Learning in SQL Server.