The default group failover threshold value in the Windows Server 2008 Failover Cluster Management snap-in is incorrect

Symptoms

When you configure a highly available application or service in a Windows Server 2008 failover cluster, the group failover threshold value is incorrect. The default value is set equal to the number of nodes that are configured in the cluster.

For example, in a five-node cluster, any highly available application or service resource grouping has a default failover threshold set equal to five. By default, the Period: (hours): setting is set to six hours. Therefore, when a highly available service or application group experiences a failure of one or more resources in the group, the service or application group tries to fail the group over to another node in the cluster up to five times in a six-hour period. After the fifth failover attempt, the service or application group remains in a "Failed" state.

In this situation, a total of n - 1 failovers occurs in the six-hour period. Therefore, four failovers occur. The failover process works correctly. However, the number that appears in the Failover Cluster Management snap-in is incorrect. In this situation, the number is 5.

↑ Back to the top

Resolution

No adverse effect is associated with the incorrect display in the Failover Cluster Management snap-in. Failover continues to function correctly. No action is required.

↑ Back to the top

Status

This is a known issue in Windows Server 2008 Failover Clustering. This problem is corrected in Windows Server 2008 R2.

↑ Back to the top

More Information

Steps to reproduce the issue

In the Failover Cluster Management snap-in navigation pane, expand one of the managed clusters that has a highly available application or service configured.
Expand the Services and Applications category.
Select and then right-click one of the groups, and then click Properties.
Click the Failover tab, and then view the Maximum failures in the specified period setting.

The number that you see is equal to the number of nodes in the cluster.
To simulate the behavior, select and then right-click a resource in the group, and then click Simulate Failure of this Resource under More Actions.

The default restart behavior for a cluster resource is to try to restart the original owning node. Therefore, the failure that you have started causes a failure. The resource comes back online on the owning node.
Start a failure again. This causes the group to go offline, and then to move to another node in the cluster.
Execute step 5 and step 6 until the resource remains in a "Failed" state. Make sure that you count the number of times that the group comes online on other nodes in the cluster. The final count is equal to n - 1.

For additional testing, follow these steps:

Select and then configure another service or application.
Increase the Maximum failures in the specified period setting by one.
Select and then right-click a resource in the group, and then click Simulate Failure of this Resource under More Actions.
Start a failure again.

The failover count now matches the new setting.

↑ Back to the top

Keywords: kbclustering, kbexpertiseadvanced, kbtshoot, kbprb, kb

↑ Back to the top

Article Info

Article ID	:	950804
Revision	:	3
Created on	:	3/30/2017
Published on	:	3/30/2017
Exists online	:	False
Views	:	138

Microsoft KB Archive Search

The default group failover threshold value in the Windows Server 2008 Failover Cluster Management snap-in is incorrect

Symptoms

Resolution

Status

More Information

Steps to reproduce the issue