Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Cluster disk is read-only if two sites lose network connection in Windows Server 2008 R2


View products that this article applies to.

Symptoms

Consider the following scenario:
  • You configure a quorum model to the file share witness resource on a Windows Server 2008 R2-based cluster.
  • You have one of the following software applications installed in the cluster:
    • EMC Symmetrix Remote Data Facility/Cluster Enabler (SRDF/CE) 4.x
    • EMC MirrorView/Cluster Enabler (MV/CE) 4.x
    • EMC RecoverPoint/Cluster Enabler (RecoverPoint/CE) 4.x
  • Two sites lose network communications.
In this scenario, the cluster disk on the production node becomes read-only.

↑ Back to the top


Cause

This is a compatibility issue between Microsoft Failover Clustering's file share witness arbitration and EMC Cluster Enabler. This issue does not occur in Windows Server 2003, Windows Server 2012, or a later Windows operating system.

When two nodes lose communication, arbitration starts. After 60 seconds (the maximum failure time-out for the arbitration), node 2 (the challenging node) may bring other groups online while file share witness is still doing another round of arbitration. The design of EMC Cluster Enabler is that when the cluster brings the Cluster Enabler resource online, Cluster Enabler will make sure that the local disks are writable and that the local disks on the other site are not. Therefore, the disk on node 2 is marked as writable, and the disk on node 1 is marked as read-only.

However, if node 2 is cannot gain the quorum in 90 seconds (the death time for file share witness quorum), the cluster service on node 2 is stopped. This leaves a read-only disk on node 1, which causes a business outage.

↑ Back to the top


Workaround

To work around this issue, change the death time to less than 60 seconds to prevent the challenging node from trying to bring other resources online. To do this, run the following command at the command prompt on a node in the cluster: 
cluster /prop QuorumArbitrationTimeMax=50 

↑ Back to the top


Status

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.

Third-party information disclaimer
The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.


↑ Back to the top


Keywords: kbsurveynew, kbexpertiseadvanced, kbprb, kbtshoot, kb

↑ Back to the top

Article Info
Article ID : 3006754
Revision : 1
Created on : 1/7/2017
Published on : 11/3/2014
Exists online : False
Views : 217