Opinion: Don't buy hardware before a Proof of Concept

Just a short post today to call out something that I'm seeing again and again. It's where organizations purchase all their hardware and software platforms before they start to carry out a proof of concept. This is a very poor option.

I was reading the data strategy for a global company that I was doing consulting work for. They were proudly introducing the new strategy yet I was sitting looking at it, trying to work out what they were thinking. The first step of their plan was to buy everything they needed. The second step was to carry out a proof of concept to see how it would all work (presuming it would work suitably at all).

This is ridiculous.

In that case, I think what's happening is that the IT management wants to seem proactive, buying hardware and software platforms is what they are experienced at, and they want to look like they are "doing something".

Image by RawPixel
Image by RawPixel

Yet, invariably, this locks them into decisions that aren't in the best interests of the organization. Instead of making sensible decisions, they end up making decisions, based on what they have already committed to. And the more expensive that purchase was, the more they will try for years to justify the expenditure decision that they made. Every choice will later be taken, based upon how well it fits with their existing purchase.

Don't do this.

Do everything you can to carry out the proof of concept without buying anything that locks you into a decision path.

Opinion: SQL Server Databases in the Cloud – Single Tenant or Multi Tenant?

I spend a lot of time working with software houses (Microsoft calls these ISVs – Independent Software Vendors). More and more, these companies are looking to convert their on-premises applications to cloud-based SaaS (Software as a Service) offerings.

For many of the ISVs, their on-premises applications are single-tenant ie: they are designed to support a single organization. When they are looking to cloud offerings, one of the first decisions is whether they should create a single database to hold the data for all their client organizations, or whether they should create a separate database for each client organization.

As with most things in computing, there is no one simple answer to this.

Here are the main decision points that I look at:

Isolation

For me, this is the #1 item. You have to decide how important isolating one client's data from other clients is. Time and again, I hear how "the app does that" but you need to keep in mind that in most multi-tenant models, you are only one faulty WHERE clause away from showing one client, another client's data. In fact, it's usually far harder to get the code correct in multi-tenant databases.

So I think you need to start by considering what the outcome of that would be. For some ISVs, this would be deeply embarrassing but manageable. For other ISVs, this would simply be terminal for the organization.

Imagine the discussion between your CEO and the client whose data was disclosed. How would that discussion go? Would you still have a job? Would you still have an organization?

Image by Dmitry Ratushny
Image by Dmitry Ratushny

If you have even the slightest doubt about this, you should lean towards single-tenant ie: a separate database per client. You still have the chance to mess that up, but you are starting in a better place.

Price

This one is pretty easy. In every current cloud provider, single larger databases are more economical than large numbers of smaller databases. I don't think that cost should be your primary concern for this, but if it is, you will lean towards single-database designs.

If you are working in Azure SQL Database though, and haven't looked at their elastic database pools, you should consider them before making your decision.

Cloudiness

A key aspect of "cloudiness" is the granularity of providing resources as required, just when they are required. Having separate databases for each client is much more "cloudy". A simple example of this is that each of your clients might require a different level of performance and/or features.

One client might want to run a very low cost test, another might need good solid general performance, another might need the best performance available. If you use separate databases, even clients running the same application could use databases with different scale and/or performance.

You might also be able to add features to specific clients. For example, one client might want a read-only copy of his/her data in another location.

Having separate databases lets you decide these things on a client-by-client basis.

One other option to consider here is that you might have different requirements even for a single client organization. They might have different environments (ie: Production, UAT, Test, etc.) that require different capabilities.

Noisy Neighbors

Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.

Image by Nik Shuliahin
Image by Nik Shuliahin

One client gets to the point that they can't get their work done because of the overhead being placed on the single database by another tenant.

If you have a situation where the load placed by different tenants varies, you are going to find life much easier if you have separate databases.

If you use a multi-tenant design, you will need to consider how to move one tenant to another database if that's required. (I've been in ISVs where this is only considered when someone starts complaining but it needs to be part of the design from day #1).

Query Performance

When you have a multi-tenant database, almost every table will have a tenant ID of some type, and these columns will be involved in almost every join operation.

It's not hard to imagine that these databases simply perform worse. Note that I'm not talking about single vs multi-database on a single SQL Server. In those situations, there can also be positive performance outcomes from a single database design – but that's  a topic for another day.

Recovery/Rollback/Import/Export/Retention

If you have one client that needs to roll back their data to an earlier time, this is often extremely difficult with single-database designs. It's trivial to achieve with multi-database designs.

The same applies to situations where clients have different retention policies, or where there is a need for a client to be able to export their own data (or import it again).

Data Ownership

You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.

Is that an appropriate risk for your organization?

One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership.  It might also help if there are data sovereignty issues.

Take legal advice on this.

Summary

As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.

DevOps: Thoughts on Microsoft’s Acquisition of Github

I have many friends who would have checked the calendar when they first heard that Microsoft was buying Github. They would have guessed it was April 1st.

I think it’s another pretty bold strategic move by Satya Nadella.

It’s been interesting to see all the naysayers coming out of the woodwork to beat up on the idea of Microsoft owning Github, as though it was going to be the death of Github. Almost every single time I hear a justification though, it is based on their opinion of Microsoft or something they did often decades ago.

People outside the Microsoft camp seem genuinely unaware of the seismic changes that have happened within Microsoft in recent years. As someone who has worked with them and followed them closely for decades, I can tell you that it is a very, very different company now. If your opinion of them is based on anything more than a few years old, it’s time to re-evaluate your stance.

Microsoft is already a heavy user of Github, and is the largest contributor to open source software on the planet.

But more importantly, their acquisition puts Github into a solid financial position that it did not have before. Github was pretty much at a crossroads, had a few suitors, but in any rational evaluation, Microsoft was the best positioned for this.

From Microsoft’s point of view, I can see how it will beautifully mesh with many ongoing changes within the company, particularly as things like Azure Functions take hold. It also provides more certainty for existing Github enterprise customers, and will introduce a whole new raft of enterprise level customers to Github.

The one big concern that I have is around identity. This is an area that Microsoft hasn’t yet sorted out. There are still too many issues with Microsoft Accounts vs Organizational Accounts and so on. There needs to be a good plan to integrate the Github identity system.

DevOps: Deleting stored procedures and views in target database when using database projects

We've been using SQL Server database projects ever since they were called "Visual Studio Team Edition for Database Professionals". That was back in 2005 and the name was quite a mouthful.

Until the last year or so though, we've only seen fairly patchy adoption of them among our customers. Over the last year or so, this has really changed. We're seeing so many customers starting to use them. Why?

I'm guessing that it's the increased interest in using DevOps.

One question that I hear again and again though relates to deployment. Developers assume that if you delete a stored procedure or view from the database project, that it will be deleted from the target database when you publish the project.

By default, that won't happen but it's easy to change.

When you are in the Publish Database window, click Advanced.

In the Advanced Publish Settings window, select the Drop tab:

The option that you're looking for is Drop objects in target but not in source:

You can then choose which types of objects that this applies to.

 

DevOps: To branch or not to branch

One of the discussions that comes up from time to time when working with DevOps is branching strategy.

One of the main features of Git that's often claimed is that it's so good at branching and merging. And indeed, it's pretty good at that. But the bigger question is whether lots of branching is desirable in the first place.

One argument says that if you are using branches (let's say to build features), that you really aren't doing Continuous Integration (CI). The downside of being features in separate branches is that at some point, you'll have to merge the code back in, and there's probably going to be nothing automated about that. One software house that I've been mentoring in has a very large number of active live branches.

Each time a merge to a central trunk (ie: master branch) is attempted, it's beyond painful, often takes months, and introduces enormous numbers of bugs that need to be fixed. If you then combine that scenario with a lack of decent testing, you have a problem. Merging becomes really, really scary.

Image by Priscilla Du Preez
Image by Priscilla Du Preez

There is nothing agile or continuous about that.

The alternative approach is to always work in the master branch. While you can end up colliding with what other people are doing, at least you deal with that right then and there. You can end either with a frustrating set of collisions so that you're endlessly working with an unstable base, but at least you know about it straight away.

A bigger issue is that it's easy to accidentally release unreleased features into the wild, or at least parts of them. Feature flags are often used to try to hide this but that can go wrong too.

So in the end, to branch or not to branch? My take on it is that branches are fine if they are letting you try something out in a scratchpad area, and when they are really short-lived. But long term branches aren't something that I'm keen to see.

DevOps: Scripting SQL Server objects and data from the command line

The other day I posted a shortcut about how to use SQL Server Management Studio (SSMS) to generate INSERT statements for data.

In one of the comments, Tom Corrigan asked if there was a way to do that from the command line.

The answer is yes. Apart from a variety of 3rd party tools, in May last year, Microsoft released tools to do just that.

The new tool is mssql-scripter and you'll find an intro to it here:

https://blogs.technet.microsoft.com/dataplatforminsider/2017/05/17/try-new-sql-server-command-line-tools-to-generate-t-sql-scripts-and-monitor-dynamic-management-views/

They describe it as "the multiplatform command line equivalent of the widely used Generate Scripts Wizard experience in SSMS".

Importantly, note the fact that it's multiplatform. It's built in python and works on Linux, macOS, and Windows and can create both DDL and DML scripts that target SQL Server, and that also includes Azure SQL Database and Azure SQL Data Warehouse.

You choose where the output goes. Normally you'll send it to .sql files but like any other Unix style utility, you can pipe its stdout to other commands.

The source code is up at github here:

https://github.com/Microsoft/mssql-scripter

I love the fact that Microsoft teams are now putting source code like this up on github: continuing evidence of a "new Microsoft".

That means that you can:

  • Use it as a great learning resource
  • And if you're keen, submit pull requests to improve it

So to generate INSERT statements, you can use it like this:

mssql-scripter -S somesqlserver -d WideWorldImporters -U someuser -P somepassword –include-objects Customers –data-only

Note that it would all be on a single line.

Opinion: Which SQL Server columns should be nullable #2: Avoid placeholder values

In my previous opinion post, I was discussing why magic values where a problem. They are often added by developers when they are trying to avoid having any NULL data in a database.

While discussing this, I think it's worth also highlighting the overall dangers of placeholder values. These are values that aren't magic values but are ones that are intended to be replaced at a later time.

If you use any of these, you need to have a solid process in place to:

  • Make them consistent and easily identifiable
  • Replace them in any database that's heading towards production
  • Avoid them like crazy in production databases (and if you can't do that, have a great process for finding any that leak out)

Here are some examples of what I'm talking about:

This one is quite nasty because it really has nothing to make it stand out from the containing text.

The main image above with the TV show is a more obvious one (or you'd hope so). No doubt it's a little harder for them to detect the use of that one but it can't be that hard, or they need to use another type of image that doesn't require human detection.

One of my favorites is where people have used text generators, but then forgotten to replace the text. Here's an example on a wine bottle:

This is the classic Lorem Ipsum text.

The problem here is that they wanted something that really made the mock-up of the bottle look like the real thing. Unfortunately, it really did. I am surprised that whoever printed the labels didn't notice it though.

Ideally, if you have a good DevOps process, you might need to have a testing step that checks for the presence of any placeholder values before any data is published.

Before I leave this topic, for a little fun, Shopify published a link to some cute lorem ipsum style generators.

https://www.shopify.com.au/partners/blog/79940998-15-funny-lorem-ipsum-generators-to-shake-up-your-design-mockups

They did fortunately include my oldest favorite: Bacon Ipsum.

Next week, back to how NULL does/doesn't fit with database design.

 

 

DevOps: What is a managed service?

Some time back, I decided to use a WordPress host for my blog because the time for me at sqlblog.com was coming to an end, community service was on its last legs, and WordPress seemed the obvious platform for a blog. Fellow MVP Adam Machanic made it really easy for me to migrate to a WordPress site with a tool that he had created.

Along with another site, I put them onto BlueHost and had reasonable hopes for them. However, a few weeks back I found that my blog had simply stopped working. I couldn't even log into the control panel to work out what was wrong.

The support chat that they offer is beyond slow. Every time I've used it, it takes ages to have anyone connect, and then while you're talking to them, it's like talking to someone on Valium. There are really long delays between responses. I can only assume that they are trying to talk to lots of people at once.

In the end, they told me that I'd run out of disk space. For the life of me, I couldn't work out how that was possible, given the amount of space purchased. Eventually it became obvious that the thing occupying all the space were some backups that I didn't configure, that were being performed into a folder that I couldn't see unless I could log on, and with no reasonable cleanup policy to get rid of old backups ie: You could argue that it was designed to fail.

The amazing part was when I asked them to just delete the old backups, because I was unable to, they told me they'd arrange it but it could take up to another day or so to do that. No-one who had permission to delete them was at work.

That's when I decided that I need to move away from this service, even though I'd prepaid it for quite a while into the future. We'd moved the other website already anyway, and so I thought I'd look for a managed hosting service for WordPress.

Many people mentioned they had good outcomes with InMotion hosting. When I checked it out, I saw they had "WordPress-Optimized Managed Web Hosting". I looked at the list of features and was very, very impressed with what they offered and eventually signed up.

One of the first things the guy activating my account explained though, is that I need to get involved in the backups. Turns out they do a backup every 24 to 36 hours, over the top of the previous backup. They only keep a single backup for me.

I pointed out that a backup that's done over the top of a previous backup really isn't a backup at all. 

The guy told me that to use their service, I needed to make sure I kept local copies regularly, in case something went wrong. I asked him "if I need to do my own backups, what exactly about their service is managed?" He couldn't answer me and said he needed to ask someone else in the company.

Surely keeping reliable backups of the site (and me not having to do it) is the number #1 thing that a managed service should provide.

Sadly, I've had to cancel the service before I even got started with it. It's a real pity because there's much to like about their offerings, and they are responsive.

I've never been a fan of GoDaddy (had some beyond-horrid experiences dealing with them in the past), but as a comparison, it's informative to look at what they advertise for backups in their managed service. They say they keep daily backups for 30 days, and have a one-click restore process to go back to any of them.

Microsoft does similar things with Azure SQL Database. I don't worry about the backups, they keep many, and at any point I can just roll a database back to a previous point within the last few weeks.

That's more what I'd expect in a "managed" service.

While it's easy to call a service "managed", here's a hint: if I need to be involved with the core admin functions like backup, it's not a managed service.

 

 

DevOps: Human testing is still critical – and make sure they can spell

I've spent a lot of time lately working with automated testing systems.

Unit tests are great for checking blocks of code, etc. but integration tests and functionality tests are critical to projects. An area that is still tricky though is UI testing. There are some really good tools for this.

Selenium has been very popular and works well if you access it programmatically. There are good framework libraries for .NET and we've been using those. Selenium also had a pretty good IDE that could be used for recording but it's pretty much adrift now as it was based on an older version of Firefox and doesn't work on the latest versions. You could install it on an older version for testing but those versions aren't safe to use so the recommendation is to avoid it. Working with it programmatically is not that hard though.

This article shows a number of alternatives: https://www.guru99.com/selenium-alternatives.html

I've tried quite a few now but had pretty mixed results, so this area is still evolving. I had fairly good results with Katalon Studio but ended up quite frustrated with trying to use its recorder. So I ended up back to coding Selenium directly.

One thing that's been very clear to me though, is that no matter how good all this automated UI testing is though, you still need humans to do it as well. You can see examples above of basic UI issues.

Here's an example of a UI logic error:

It's also very clear that the humans that do this need to be able to spell.

No matter how good looking your application is, if it's full of spelling errors typos, or graphical errors, it won't look professional.

DevOps: Keep moving forward not backwards when deploying to production

I was asked a curious question the other day: “Are all changes made to production systems actually deployments?”

I wish the answer was yes, but unfortunately patches do get applied directly to production systems without going through standard deployment processes.

Everyone understands the scenario. Code has been deployed using the normal processes, then someone has something that simply must be fixed urgently in the productions systems. At that point, you have two choices:

  • The traditional choice is to urgently patch the production server, then reapply those changes back into the development process.
  • The ideal DevOps approach is that an urgent patch is made in earlier environments then deployed through the normal processes.

The reason that people opt for the traditional choice above is that going back through the processes simply takes too long in most organizations. The problem with this option is that there’s no guarantee that the change that was made in production will be reapplied the same way when it goes back into the normal processes. Worse, it might not be applied at all. Even worse, if there are multiple production servers, it might be applied differently to each of those as well.

Image by Alice Achterhof

Then you have a real mess to deal with. Deployments start to break, and because they are then painful, they become less frequent and more risky, and then they break even more. It’s a horrible cycle that you can easily get into.

The aim when working with DevOps is to streamline the process for doing this to the point where you won’t hesitate to have it go through the normal deployment processes. You urgently patch and deploy the patch instead.