Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server clusters


View products that this article applies to.

Introduction

This article describes an update that you can apply to add the following two new features to a Microsoft Windows Server 2003 Service Pack 1 (SP1)-based server cluster or to a Windows Server 2003 R2-based server cluster:
  • File share witness
  • Configurable cluster heartbeats
Although these two features are contained in the same update package, these two features are both separate and independent from each other.

↑ Back to the top


More information

File share witness

The file share witness feature is an improvement to the current Majority Node Set (MNS) quorum model. This feature lets you use a file share that is external to the cluster as an additional "vote" to determine the status of the cluster in a two-node MNS quorum cluster deployment.

Consider a two-node MNS quorum cluster. Because an MNS quorum cluster can only run when the majority of the cluster nodes are available, a two-node MNS quorum cluster is unable to sustain the failure of any cluster node. This is because the majority of a two-node cluster is two. To sustain the failure of any one node in an MNS quorum cluster, you must have at least three devices that can be considered as available. The file share witness feature enables you to use an external file share as a witness. This witness acts as the third available device in a two-node MNS quorum cluster. Therefore, with this feature enabled, a two-node MNS quorum cluster can sustain the failure of a single cluster node. Additionally, the file share witness feature provides the following two functionalities:
  • It helps protect the cluster against a problem that is known as a split brain. This problem occurs if the two nodes in a MNS quorum cluster cannot communicate with each other. In this situation, each cluster node is unable to determine whether the loss of communication occurred because the other cluster node failed, or whether the loss of communication occurred because of a problem with the network. The file share witness can designate one of the cluster nodes as the surviving cluster node. That cluster node can then determine that it should continue to run the cluster. In this scenario, the surviving cluster node can determine that the other cluster node failed, or that the other cluster node was not sanctioned by the file share witness.
  • It helps protect the cluster against a problem that is known as a partition in time. This problem occurs if the following conditions are true:
    • Cluster node A is running, but cluster node B is not running.
    • Cluster node A stops running.
    • Cluster node B tries to run the cluster.
    In this situation, cluster node B may not have cluster state information that was updated on cluster node A. Therefore, cluster node B may run the cluster by using incorrect state information. The file share witness feature helps prevent this problem by detecting that the cluster state has changed. The file share witness feature prevents the cluster node that contains outdated cluster state information from running the cluster. When this action occurs, an Error event that contains the following Description information is logged in the System event log:

    Majority Node Set resource has failed to come online. The resource has detected that it does not have the latest copy of server cluster database. If this is a two node cluster, try starting Cluster service on the other node, and let this node join the cluster. If that does not work, use the /forcequorum startup option to start the Cluster service.

For more information about Cluster service startup options, click the following article number to view the article in the Microsoft Knowledge Base:
258078� Cluster service startup options

File share witness properties

The file share witness feature adds the following new private MNS resource configuration properties.
MNSFileShare
You can use the MNSFileShare property to set a file share that is external to the cluster as the file share witness. The following list contains the requirements for MNSFileShare property together with the requirements for the external file share:
  • The MNSFileShare property is ignored if the number of cluster nodes is anything other than two. This behavior occurs even if the MNSFileShare property is set. If the number of cluster nodes is a number other than two, the MNS resource behaves as if it is in a typical MNS quorum. Therefore, you cannot use the file share witness feature to affect an MNS quorum cluster that has other than two cluster nodes.
  • After you modify the MNSFileShare property, or after you modify either of the other new private MNS resource configuration properties that this update provides, the changes do not take effect until you next bring the MNS resource online. When the MNS resource is set as a quorum resource, it does not allow for offline and online operations. Therefore, you must move the group that contains the MNS resource to the other cluster node. This action brings the resource online on the destination cluster node. Then, the modifications that you made to the MNSFileShare property, or to the other two properties that this update provides, take effect.
  • The Cluster service on each cluster node runs under the context of a domain user account that is known as the Cluster service account. Because the MNS resource creates a file on the external file share, the Cluster service account must have write permission to the external file share. Therefore, you must grant Change share permissions or Full Control share permissions for the Cluster service account on this share.
  • The MNSFileShare property must be set to a string value that contains the Universal Naming Convention (UNC) path of a file share that is external to the cluster. For example, this property must be set to \\server_name\share_name. The external share is used to store only a small amount of data. Therefore, a share that has between 2 megabytes (MB) and 5 MB of available hard disk space is sufficient.

    Important The external share does not store the full state of the cluster configuration. Instead, the external share only contains data that is sufficient to help prevent split-brain behavior and to help detect a partition in time behavior.
  • Each server cluster must be configured to use its own external file share. We do not support the use of the same external file share for multiple server clusters. However, you can use a single external server that is configured to have one share for each server cluster.
  • For a geographically-dispersed two-node cluster that has each node located in a different site, you can co-locate the external file share with any one of the cluster nodes. Alternatively, you can configure the external share in a separate third site.
MNSFileShareCheckInterval
The MNSFileShareCheckInterval property is used to set the interval to verify the health of the external file share. The default interval is four minutes. This property value is set in seconds and has a range that is specified in the following table.
Collapse this tableExpand this table
ValueNumber of seconds
Minimum4
Default240
Maximum268435455
The external file share that acts as a witness for the MNS quorum cluster is checked at set intervals to make sure that the MNS resource can write to it. If this verification fails, a Warning event that has the following Description information is logged in the System event log:

Majority Node Set resource has failed a status check for file share '\\server_name\share_name. The error code was '67'. Please ensure that the file share is configured properly and that the Cluster service account has write permission on the file share

This Warning event is logged one time for every verification failure that occurs until the verification operation succeeds.
MNSFileShareDelay
The MNSFileShareDelay property specifies a delay time for the cluster node that does not currently own the MNS quorum resource. If the cluster nodes lose communication with each other, each cluster node tries to obtain the "vote" of the file share witness. The value of the MNSFileShareDelay property specifies the number of seconds to delay the cluster node that does not currently own the MNS quorum resource. This behavior gives the cluster node that currently owns the MNS quorum resource preference in winning the vote of the file share witness. This property value is set in seconds and has a range that is specified in the following table.
Collapse this tableExpand this table
ValueNumber of seconds
Minimum0
Default4
Maximum60

Configuring a file share witness

To configure the file share witness, follow these steps:
  1. Set the MNSFileShare property of MNS quorum resource to the external file share that will act as the file share witness. To do this, run the following command at a command prompt:
    cluster cluster_name resource mns_resource_name /priv MNSFileShare=\\server_name\share_name
  2. Move the group that contains the MNS resource. To do this, run the following command at a command prompt:
    cluster cluster_name group mns_resource_group_name /move
You can also access these private MNS resource configuration properties by using any one or more of the following cluster interfaces:
  • The cluster API
  • The cluster Windows Management Interface (WMI) provider
  • The Cluster Automation Server
Note You cannot use the Cluster Administrator tool to set these properties. This is because the Cluster Administrator tool does not expose these private MNS resource configuration properties.

Configurable cluster heartbeats

The configurable cluster heartbeats feature enables you to configure cluster heartbeat parameters. This may help avoid unnecessary cluster failovers. These failovers occur because of a temporary network problem that may cause packets to be dropped or delayed. The configurable cluster heartbeats feature may help in an environment where cluster nodes are geographically dispersed.

The current cluster heartbeat algorithm sends a heartbeat message every 1.2 seconds from each interface on each cluster node. This message is sent to each interface on the same cluster network. Therefore, each cluster node both sends a heartbeat message every 1.2 seconds and expects to receive a heartbeat message every 1.2 seconds. If two consecutive heartbeats from the same interface are missed, the Cluster service suspects that an interface failure may have occurred. If six consecutive heartbeats are missed from all the interfaces on a node, the Cluster service suspects that a node failure may have occurred.

If the Cluster service suspects that a failure has occurred, the Cluster service runs a distributed consensus algorithm to identify whether a failure has occurred. An interface failure causes the failure of IP address resources. An IP address resource failure may cause a resource group failover to another cluster node. A node failure forces the node to be removed from active membership of the cluster. Therefore, all the resource groups on the affected node fail over to another cluster node.

The configurable cluster heartbeats feature exposes the lost-interface heartbeat periods and the lost-node heartbeat periods as new private cluster configuration properties. This feature does not affect heartbeat timing. Heartbeats are still sent every 1.2 seconds. However, this feature lets you configure the cluster to be more tolerant of heartbeat failures. These failures may occur for one or more of the following reasons:
  • Dropped packets
  • Excessive network latency
  • Network interface failure
  • Cluster node failure
The property values are configured in units of missed heartbeats, not by the elapsed time. Therefore, by using this feature, you cannot configure the cluster to suspect a node failure after 5 seconds. However, you can configure the cluster to suspect a node failure after five missed heartbeats.

Note Depending on when in the heartbeat period the failure occurs, five missed heartbeats correspond to approximately 5 or 6 seconds.

Because of the way in which the verification for heartbeats is timed, the interface threshold is calculated differently from the node threshold. Therefore, you must set the interface threshold to the number of missed heartbeats plus one. For example, to configure an interface failure after two missed heartbeats, you must set the interface threshold to a value of 3.

All the cluster nodes must have a status of Up to receive the property change. Also, you must restart the Cluster service on each node for the property change to take effect.

Note You can restart the Cluster service on one node at a time.

This property value has a range that is specified in the following tables.

Interface

Collapse this tableExpand this table
ValueNumber of missed heartbeats
Minimum2
Default3
Maximum20

Node

Collapse this tableExpand this table
ValueNumber of missed heartbeats
Minimum2
Default6
Maximum20

Configuring cluster heartbeats

To configure the cluster heartbeats feature, follow these steps.

Note These steps change the heartbeat configuration for interfaces to four missed heartbeats. This corresponds to approximately 5 seconds. Additionally, these steps change the heartbeat configuration for nodes to 10 missed heartbeats. This corresponds to approximately 12 seconds.
  1. Make sure that all the cluster nodes are up. To do this, run the following command:
    cluster cluster_name node
    After you run this command, verify that the status value for each cluster node is Up.
  2. Set the number of heartbeat misses for interfaces and for nodes. To do this, run the following commands:
    • cluster cluster_name /priv HeartBeatLostInterfaceTicks=5:DWORD
    • cluster cluster_name /priv HeartBeatLostNodeTicks=10:DWORD
    Note You must specify the DWORD string when you set private properties by using a command-line command.
  3. Stop and then restart the Cluster service on each cluster node. To do this, run the following commands on each cluster node:
    • net stop clussvc
    • net start clussvc
You can also access these private cluster configuration properties by using any one or more of the following cluster interfaces:
  • The cluster API
  • The cluster WMI provider
  • The Cluster Automation Server
When you use these interfaces to set one or both of these cluster heartbeat private properties, the following status code is returned:
ERROR_SUCCESS_RESTART_REQUIRED
Note You cannot use the Cluster Administrator tool to set these properties. This is because the Cluster Administrator tool does not expose private cluster configuration properties.

Note There is a known regression in update 903650. This regression prevents you from creating a quorum by using Majority Node Set (MNS). For example, when you convert from shared quorum resource to MNS, an error 1 (invalid function) occurs. Additionally, if the cluster is already using MNS for a quorum and the 903650 update is applied, the MNS resource is unable to come online. In such a scenario, the cluster administrator displays the following error message:
An error has occurred attempting to make <MNS_Resource> the quorum resource.
Incorrect function Error ID: 1 (00000001).
Snippit from the cluster log: Majority Node Set <MNS>: Expanded path '\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$' [FM] FmSetQuorumResource: Entry, pszClusFileRootPath=\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$\MSCS 000000ac.00001038::2006/10/01-03:38:13.370 ERR [FM] FmSetQuorumResource: Unable to get maintenance mode info for resource 'MNS', status 1 [FM] FmSetQuorumResource: Exit, status=1 [FM] FmSetQuorumResource: Entry, pszClusFileRootPath=\\fa67fd8c-7325-4\fa67fd8c-7325-4751-bf3b-d3f3131f32b6$\MSCS 000000ac.00001758::2006/10/01-03:38:59.730 ERR [FM] FmSetQuorumResource: Unable to get maintenance mode info for resource 'MNS', status 1 [FM] FmSetQuorumResource: Exit, status=1

Service pack information

These features are included in Microsoft Windows Server 2003 Service Pack 2. For more information about the latest service pack for Windows Server 2003, click the following article number to view the article in the Microsoft Knowledge Base:
889100� How to obtain the latest service pack for Windows Server 2003

Update information

The following files are available for download from the Microsoft Download Center:

Update for Windows Server 2003

Download the 921181 package now.

Update for Windows Server 2003 x64 Edition

Download the 921181 package now.

Update for Windows Server 2003 for Itanium-based Systems

Download the 921181 package now.

Release Date: July 5, 2006

For more information about how to download Microsoft support files, click the following article number to view the article in the Microsoft Knowledge Base:
119591� How to obtain Microsoft support files from online services
Microsoft scanned this file for viruses. Microsoft used the most current virus-detection software that was available on the date that the file was posted. The file is stored on security-enhanced servers that help prevent any unauthorized changes to the file.

Prerequisites

You must be running one of the following operating systems to apply this update:
  • Windows Server 2003 SP1
  • Windows Server 2003 R2

Restart requirement

You must restart the computer after you apply this update.

Update replacement information

This update does not replace any other updates.

File information

The English version of this update has the file attributes (or later file attributes) that are listed in the following table. The dates and times for these files are listed in Coordinated Universal Time (UTC). When you view the file information, it is converted to local time. To find the difference between UTC and local time, use the Time Zone tab in the Date and Time item in Control Panel.
Windows Server 2003, 32-bit x86 editions
Collapse this tableExpand this table
File nameFile versionFile sizeDateTimePlatformSP requirementService branch
Clcfgsrv.inf5.2.3790.273616,38029-Jun-200611:17x86SP1SP1QFE
Clusnet.sys5.2.3790.273677,82429-Jun-200611:17x86SP1SP1QFE
Clusres.dll5.2.3790.2736481,28029-Jun-200614:12x86SP1SP1QFE
Clussvc.exe5.2.3790.2736841,21629-Jun-200611:17x86SP1SP1QFE
Resrcmon.exe5.2.3790.273668,09629-Jun-200611:17x86SP1SP1QFE
W03a2409.dll5.2.3790.273626,62429-Jun-200611:02x86SP1SP1QFE
Windows Server 2003, 64-bit x64 editions
Collapse this tableExpand this table
File nameFile versionFile sizeDateTimePlatformSP requirementService branch
Clcfgsrv.inf5.2.3790.273616,38029-Jun-200614:48x64SP1SP1QFE
Clusnet.sys5.2.3790.2736128,51229-Jun-200614:48x64SP1SP1QFE
Clusres.dll5.2.3790.2736655,36029-Jun-200614:48x64SP1SP1QFE
Clussvc.exe5.2.3790.27361,235,96829-Jun-200614:48x64SP1SP1QFE
Resrcmon.exe5.2.3790.273697,28029-Jun-200614:48x64SP1SP1QFE
W03a2409.dll5.2.3790.273627,13629-Jun-200614:48x64SP1SP1QFE
Wresrcmon.exe5.2.3790.273668,09629-Jun-200614:48x86SP1WOW
Ww03a2409.dll5.2.3790.273626,62429-Jun-200614:48x86SP1WOW
Windows Server 2003, 64-bit IA-64 editions
Collapse this tableExpand this table
File nameFile versionFile sizeDateTimePlatformSP requirementService branch
Clcfgsrv.inf5.2.3790.273616,38029-Jun-200614:48IA-64SP1SP1QFE
Clusnet.sys5.2.3790.2736260,09629-Jun-200614:48IA-64SP1SP1QFE
Clusres.dll5.2.3790.27361,170,94429-Jun-200614:48IA-64SP1SP1QFE
Clussvc.exe5.2.3790.27362,075,64829-Jun-200614:48IA-64SP1SP1QFE
Resrcmon.exe5.2.3790.2736184,32029-Jun-200614:48IA-64SP1SP1QFE
W03a2409.dll5.2.3790.273625,60029-Jun-200614:48IA-64SP1SP1QFE
Wresrcmon.exe5.2.3790.273668,09629-Jun-200614:48x86SP1WOW
Ww03a2409.dll5.2.3790.273626,62429-Jun-200614:48x86SP1WOW

↑ Back to the top


Status

Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section. This problem was first corrected in Microsoft Windows Server 2003 Service Pack 2.

↑ Back to the top


More information

For more information, click the following article number to view the article in the Microsoft Knowledge Base:
824684� Description of the standard terminology that is used to describe Microsoft software updates

↑ Back to the top


Keywords: kbwinserv2003sp2fix, atdownload, kbwinserv2003presp2fix, kbbug, kbfix, kbqfe, KB921181

↑ Back to the top

Article Info
Article ID : 921181
Revision : 7
Created on : 10/9/2011
Published on : 10/9/2011
Exists online : False
Views : 490