Avoiding drives going offline when adding nodes to availability groups or creating / validating Windows clusters

Several of my customers lately have mentioned that when they’ve added a node to a Windows failover cluster for supporting an Availability Group, that they’ve ended up with an outage because the non-shared drives have been taken offline by the installation process. All commented that it didn’t use to happen on Windows Server 2008 R2 and they were taken by surprise when it happened on Windows Server 2012 R2.

I mentioned it in the MVP list and one of my US buddies Allan Hirt said that we need to uncheck the relevant checkbox during the install.

I stepped through another one today and this is the culprit:

image

The default is now to add all eligible storage to the cluster.

That is perhaps a reasonable default when setting up most Windows failover clusters, but it is really not a sensible default when working with Availability Groups. It ends up adding all your other drives and then puts them offline.

Unchecking that box while performing installs/validations should avoid the issue.

Thanks to Allan for pointing it out.

One thought on “Avoiding drives going offline when adding nodes to availability groups or creating / validating Windows clusters”

  1. So I did this.  I was adding another server to the cluster and as soon as I hit the Finish button the drives disappeared and the main database for my company was gone.  Step one in this situation was to find a new pair of shorts since I didn't know what happened at first.  Once I figured it out we got it fixed.  I have since documented how to fix.  See below.  Don't like that checkbox being defaulted.
    1. In Failover Cluster Manager
     a. Expand Cluster
     b. Expand Storage
     c. Select Disks
     d. Remove all disks
       i. Right click each disk
       ii. Delete or Remove
    2. On each server in the cluster
     a. Right Click the Start Menu
     b. Select Disk Management
       i. Maximize window
       ii. Find the disks that are offline (assuming you don't have disks that are supposed to be offline)
         1. Right click disk
         2. Select Online
       iii. Close Disk Management
     c. Open SQL Configuration Management
       i.   Restart SQL Service
       ii. Restart Agent if needed
     d. SSMS on secondary servers in the availability group
       i.   Open the Availability Group
       ii. Go to Availability Databases
       iii. Right Click each and select Resume Data Movement
     e. SSMS on primary server
       i. View Availability Group Dashboard to verify that everything is good.

Leave a Reply

Your email address will not be published. Required fields are marked *