ML: Properties of the LaunchPad service – changing the concurrent session limit

In a recent post, I discussed issues I found when testing the Machine Learning Services setup in SQL Server 2017.

After that, my old friend Niels Berglund also posted about issues he found after installing CU7 (cumulative update 7) and how he solved them. Niels' article is here: http://www.nielsberglund.com/2018/06/01/sql-server-machine-learning-services-and-sql-server-2017-cu-7/

What each of those articles discussed though was detail on how temporary files are used by Machine Learning Services in SQL Server to hold R or Python data for sessions. By default, SQL Server configures itself to hold data for up to 20 concurrent sessions.

You can change that number.

In SQL Server Configuration Manager, find the LaunchPad service:

In the Properties page for the service, select the Advanced tab. In the main image above, you can see the value that you can change to increase the number of concurrent sessions.

Values up to 100 are permitted.

SQL Server also creates passwords for these "external" users. If you have a policy that requires regular password changes, you'll see that there's also an option in this window to let you change all of them.

(It's worth noting that current NIST guidelines say that you shouldn't force regular password updates anyway – you'll find more info here: https://auth0.com/blog/dont-pass-on-the-new-nist-password-guidelines/)

 

 

 

AI and ML: Why have machine learning in SQL Server at all?

In a post the other day, I described how to test if machine learning with R and/or Python was set up correctly within SQL Server 2017.

One of the comments on that post, said that the info was useful but they were still to be convinced why you'd want to have machine learning in the database in the first place.

Fair question.

I see several reasons for this. SQL Server Machine Learning Services is the result of embedding a predictive analytics and data science engine within SQL Server. Consider what happens in most data science groups today, where this type of approach isn't used.

I routinely see data scientists working with large amounts of data in generic data stores. This might mean that they have data in stores like Hadoop/HDInsight or Azure Data Lake Store but in many cases, I just see operating system files, often even just CSV files. Both the R and Python languages make it really easy to create data frames from these types of files. But where did this data come from? In some cases, it will have come from the generic data store, but in most cases that I see, it has come from within a database somewhere.

And that raises a number of questions:

  • What effort is required to extract that data (particularly for large volumes)?
  • How up to date is the data?
  • What is the security context for that data?

Often the answers to these questions aren't great. What I see is data science people extracting data from existing databases into CSV files, and then loading them up and processing them in tools like RStudio. And mostly, I see that data being processed single-threaded in those tools.

The outcome of this work though, is either analytics or (more commonly), trained predictive models.

Having Machine Learning in SQL Server helps here in several ways. First, you can utilize the same security model that you're using for any other access to that same data. Second, as the data volumes grow, you aren't needing to move (and then refresh) the data. You can process it right where it is. Third, you can take advantage of the multi-threaded architecture of SQL Server.

With Operational Analytics in SQL Server 2016 and later (basically non-clustered columnstore indexes with delayed aggregation, built over transactional data), you might even be able to have the outcomes really up to date.

While being able to train and retrain predictive models is really important, and is hard work, it's when you use those models to create predictions that the real value becomes apparent. Trained models are quite lightweight execution-wise. You can add predictions right into your queries along with your other returned data, and very efficiently. This is where having Machine Learning within the database engine truly shines.

And you don't necessarily even need to create the predictive models. The SQL Server team have provided a series of world-class pretrained models that you can load directly into and bind to, an instance of SQL Server.

 

 

Machine Learning: Testing your installation of R and Python in SQL Server 2017

One of the wonderful additions to SQL Server in 2016 was the R language. In SQL Server 2017, Python was also added and the combination of both with SQL Server rebranded to Machine Learning Services.

Why would you want these installed? The most common answer is to enable you to run predictive analytics.

But I've found that at many sites, getting R and/or Python installed turned out to be more complicated than it seemed.

Once you have the features added (the in-database options and not the standalone options) for R and Python, you need to enable the execution of external scripts. That's easy enough:

You need to restart the SQL Server service after doing this.

Now you can try to execute this script to see if the features are working:

If all is good, you'll see that it worked. Both values would be returned.

If you haven't changed the default SQL Server configuration though, I don't think that's what you'll see. More likely, you'll see this:

Msg 39012, Level 16, State 1, Line 4
Unable to communicate with the runtime for 'R' script. Please check the requirements of 'R' runtime.

STDERR message(s) from external script:
Fatal error: cannot create 'R_TempDir'

So, why does that happen? It's SQL Server's attempt at telling you that R doesn't like paths with spaces in them, and that's what the default configuration has. (I have no idea why).

If you open Notepad (or your favorite editor) as an administrator, navigate to this file:

The file rlauncher.config holds the configuration for the R feature.

That path for the working directory isn't going to work. Now you can change the values in it to 8.3 filenames like they have for RHOME in the first line, or, my preference, point it to a different temp folder.

I then make sure that C:\Temp actually exists, and that there's an ExtensibilityData folder under that. I then copy in all the folders from the original folder:

These are used for different processes running from the SQL Launch Pad.

Then restart both the SQL Server service and the SQL Launch Pad service and try your script again.

If you still have no luck, chances are high that the security isn't correct for the folders. Ensure that the SQL Launch Pad service account has full control on the ExtensibilityData folder and all sub-folders. By default, the service account will be MSSQLLaunchPad but you can check it in SQL Server Configuration Manager.

I'd restart both services again just for good luck, and then hopefully you'll see a response from both the queries.

Then you're ready to start investigating Machine Learning in SQL Server.

 

 

 

AI: New Microsoft Professional Program in Artificial Intelligence

In the last year or so, there has been a quiet revolution going on with how Microsoft delivers training and certification.

Previously, the main option was Microsoft Official Curriculum (MOC) courses delivered by Certified Learning Partners. For some years, I've been saying that I don't see that as the longer-term model for Microsoft. I believe that's for three reasons:

  • The learning experiences team in Microsoft have needed to be a profit center.
  • The product groups want as much information out there as possible and as free as possible.
  • The creation and delivery processes for MOC courses don't lend themselves well to constantly-evolving information.

That has to lead to real challenges within the company.

The partnership that Microsoft has done with edX (https://www.edx.org/) is an interesting alternative. If you haven't been involved with edX, they are an amazing organization that allows you to access some of the best learning in the world, mostly for free. If you'd like to see some of the best lecturers from MIT, Harvard, Berkeley, Hong Kong Polytechnic, University of British Columbia, etc. you can now do that. You can use it to learn almost anything, right from your home and at your leisure.

So where does Microsoft fit into this?

Microsoft have been putting many courses up onto edX and you can learn all the material for free. This fits directly with the needs of the product groups, to get information about their products and services, and how to use them, out there for everyone.

So what about certification?

Microsoft still needs to be able to certify people. When you take a course at edX, you have the option to choose a Verified course. This currently (typically) costs around $99 USD per course. And if you pass the right combination of courses, you can achieve one of Microsoft's Professional Program certificates (https://academy.microsoft.com/en-us/professional-program/tracks/). Here are the current tracks:

So you can choose to learn any of it for free, or pay to be verified and certified. That's a great combination. I currently see this as a much better learning model than the previous official curriculum model which was far too hard to keep up to date.

I've previously completed the Data Science track, and the Big Data track, and hope to complete the DevOps track this week.

But what has me really interested is the new Artificial Intelligence Track (https://academy.microsoft.com/en-us/professional-program/tracks/artificial-intelligence/). The AI track requires 10 courses, and there is a small overlap with the Data Science track. In my case, as soon as I'd enrolled, I found that I had 3 courses already credited:

They were:

  • Introduction to Python for Data Science
  • Data Science Essentials
  • Principles of Machine Learning

The Python topic was optional in the Data Science track so those that did the R courses would not have this one. (Luckily I decided to do both the R and Python courses as I had an interest in both).

I'm looking forward to this track. Here are the overall areas covered:

I'd encourage you to check it all out and to consider enrolling if it's of interest to you.

Opinion: You have to live and breathe the technology to be good at it

Digital Transformation and Cloud Transformation are phrases that I hear bandied around at nearly every large organization that I currently doing consulting work for.

Yet, in so many cases, I can't see the organization achieving the changes required. This is for two core reasons:

  • The first is that the culture within the organizations is a major hurdle. There just isn't enough flexibility to think outside the box about alternative ways to work.
  • Worse (and probably more concerning), I see these companies taking advice on how to make these transformations from companies who don't themselves "get it".

An organization that is cloud-antagonistic internally, and stuck in an endless IT management quagmire, isn't likely to make a good cloud transformation, and they're certainly not going to be a successful partner to be able to help you to make a successful cloud migration or to implement a cloud transformation within your company.

An organization that doesn't use business intelligence (BI) or analytics internally isn't going to be able to help you make that transition either.

If the organization is claiming to be proficient in an area of technology, ask them about the use that they are making themselves of those same technologies. As a simple example, ask them about their internal analytics that they can see on their own phones.

To be any good at any of these areas of technology, companies need to live and breathe them daily. If they don't, find someone to help you who does.

R Tools for Visual Studio

In recent months, I’ve been brushing up my R skills. I’ve had a few areas of interest in this:

* R in Azure Machine Learning

* R in relation to Power BI and general analytics

* R embedded (somewhat) in SQL Server 2016

As a client tool, I’ve been using RStudio. It’s been good and very simple but it’s a completely separate environment. So I was excited when I saw there was to be a preview of new R tooling for Visual Studio.

I’ve been using a pre-release version of R Tools for Visual Studio for a short while but I’ve already come to quite like it. It’s great to have this embedded directly within Visual Studio. I can do everything that I used to do in RStudio but really like the level of Intellisense, etc. that I pick up when I’m working in R Tools for Visual Studio.

So today I was pleased to see the announcement that these tools have gone public. You’ll find more info here in today’s post from Shahrokh Mortazavi in the Azure Machine Learning blog: https://blogs.technet.microsoft.com/machinelearning/2016/03/09/announcing-r-tools-for-visual-studio-2/

Azure Machine Learning Course–First Melbourne Offering

Hi Folks, we’ve been working hard on a new Azure Machine Learning course.

Come and spend a solid day finding out why Azure Machine Learning should be part of your arsenal.

Our first Melbourne offering of Azure Machine Learning Core Skills is 31st July. I’d love to see you there:

http://www.sqldownunder.com/Training/Courses/25