The following steps describe the behavior of the SQL Server Cluster
Wizard:
- First, the SQL Cluster Wizard connects to the server and
verifies that all the databases and binaries are on shared disks.
Possible problem
About the only thing that can go wrong at this point is
that the service may not be able to start, which is normally due to the shared
disk being owned by the wrong node or the failure to install both the program
files and data files to the cluster disk.
Resolution
To correct this problem, confirm that the shared disk
is owned by the correct node before you run the cluster wizard. Also, check and
make sure that both the program files and data files are installed to the
cluster disk. - After you enter the IP address and network name, the wizard
creates a test resource with those properties and brings it online to see if
any conflicts on the network occur.
Note An error message occurs only if you enter an IP address that is
in use; invalid IP addresses or a bad subnet mask are not detected.
Possible problem
If you have just unclustered and are re-clustering the
server, you may get an error message indicating that your network name is in
use. This may occur because Windows NT occasionally fails to remove the network
name from the net bios registration properly.
First resolution
Open a command prompt window and enter the following
command: Press Return. Upon completion, try using the IP address again. If the IP
address still fails, move to the second resolution.
Second resolution
Reboot the system. - After you enter all the information, the wizard copies all
the COM files that are registered in BINN to the SQL Server subdirectory of the
location pointed to by the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\SharedFilesDir
By default, this key points to the following location:C:\Program Files\Common Files\Microsoft Shared\
Possible problem
The SQL Server Cluster Wizard is unable to find these
files or the location to which they should be copied. This problem usually
occurs when something is wrong with the following registry key used by the
wizard:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\SharedTools\SharedFilesD
Note that the setup uses a different registry key (listed below), but the two
should normally point to the same path:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\CommonFilesDir
Note By design, when this problem occurs, no error is
displays. This is done so that you can install without replication and still
run the wizard without having it fail. The only way to identify if this problem
is occurring is to look at the debug output window, which is formed by setting
_PRINT_CONSOLE_=1 in the system environment prior to running the SQL Cluster
Wizard. If this step is executing correctly, you see references to the
replication files, such as Replres.dll and Distrib.exe, as they are copied. If
you do not see references to these files, you are encountering this
problem.
First resolution
Refer to Scenario 8 in the "Specific Scenarios"
section.
Second resolution
Refer to Scenario 3 in the "Specific Scenarios"
section. - Next, the SQL Cluster Wizard copies the same files to the
same place on the other node, finding the correct path by reading the remote
registry. The SQL Cluster Wizard creates a share on the remote node named
cluster_tools_share, copies the files to that share, and then deletes it. The SQL Server Setup program must be able to read the registry on the remote node to perform this step.
Possible problemsA problem may occur if
registry key problems exist on the other node or if the wizard was unable to
create the share.
Another problem occurs if the Remote Registry service is not set to start automatically. In this situation, the following error message is logged in the core(local).log file:
MSI (c) (5C!E0) [15:40:04:153]: PROPERTY CHANGE: Modifying SqlLogMessage property. Its current value is 'Node01'. Its new value: '<EndFunc Name='VerifyAdminSharesOnNode' Return='53' GetLastError='0'>'.
MSI (c) (5C!E0) [15:40:04:153]: PROPERTY CHANGE: Modifying SqlLogMessage property. Its current value is '<EndFunc Name='VerifyAdminSharesOnNode' Return='53' GetLastError='0'>'. Its new value: '<EndFunc Name='VerifyAdminSharesOnCluster' Return='53' GetLastError='0'>'.
MSI (c) (5C!E0) [15:40:04:153]: PROPERTY CHANGE: Modifying SqlLogMessage property. Its current value is '<EndFunc Name='VerifyAdminSharesOnCluster' Return='53' GetLastError='0'>'. Its new value: '<EndFunc Name='GetVSNodeLists' Return='0' GetLastError='0'>'.
Info 2836.The installer has encountered an unexpected error. The error code is 2836. The control SelectedNodeList on the dialog ClusterNodeDlg cannot take focus.
Action 15:40:04: ClusterNodeDlg. Dialog created
Note The error code in the error message is 53. This code represents the "The network path was not found" error message.
First resolutionTo resolve the first problem, see step 8.
To resolve the second problem, start the Remote Registry service. The Remote Registry service must be running before you install a failover cluster. This prerequisite is documented in the "Before Installing Failover Clustering" topic in SQL Server Books Online.
Second resolutionTo resolve the first problem, refer to Scenario 3 in the "Specific Scenarios"
section.
- The wizard then copies the cluster specific files to the
\System32 directory of both nodes.
Possible problem
Normally, this step completes successfully. The SQL
Cluster Wizard copies the files from the CD or network share, so it is possible
that it may lose the connection to the share or be unable to create the
cluster_tools_share because it already exists.
Resolution
Refer to Scenario 3 in the "Specific Scenarios"
section. - The wizard runs the "secnode" setup, which installs the
necessary system files to the remote node and registers all the COM files that
have been copied to the C:\Program files\Common files\Microsoft shared
directory.
Possible problems
One of the most common problems at this point occurs if
the setup is run from a share point on the first node (connected through net use where you specify a user and a password). When this happens, by
default, node2 does not have access to the share so that when secnode runs it
fails to connect back to the install location to copy the files. When this
occurs, you receive a message indicating that setup could not be run on the
remote computer.
Another problem may occur if you install from a
network share when the path has a space in the name. This causes the secnode
setup to fail because it is unable to handle paths with spaces unless they are
quoted. There is no way around this problem apart from renaming the share.
First resolution
If you experience either of these problems, you should
check in the <%SYSROOT%> directory for the Sqlclstr.log file or on the
second node's TEMP directory for the Remsetup.log for clues or descriptions of
the problem. Correct all problems and then run the wizard again.
Second resolution
Permissions problems can also prevent the SQL cluster
Wizard from working correctly when performing operations on the second node.
The account under which setup runs MUST have the appropriate
permissions:- Be a local administrator for both nodes.
- Have the user right to "log on as a
service".
- Have the user right to "act as part of the operating
system".
These permissions MUST exist on BOTH nodes; otherwise this step fails.
You set these permissions from the primary domain
controller (PDC). After the correct permissions are set, you need to logoff and
then logon again for the changes to be reflected. For further details, refer to
scenario 5 in the "Specific Scenarios" section.
Possible problem
Secnode may also fail if it runs but has errors
internally, such as not successfully registering all the COM files.
Resolution
Correct all problems reported in the Sqlstp.log on the
second node. - Next, the SQL Server Cluster Wizard rebinds all the files
located in the following places:
- The SQL BINN directory.
- C:\Program Files\Common Files\Microsoft Shared\SQL
Server
- C:\Program Files\Common Files\Microsoft Shared\Database
Replication
This occurs on both nodes.
The SQL Server Cluster
Wizard then rebinds the following system files on both nodes:
- Dbnmpntw.dll
- Sqlstr.dll
- Sqlwoa.dll
- Sqlsrv32.dll
- Cliconfg.dll
- Cliconfg.exe
The SQL Server Cluster Wizard rebinds
%Sysroot%\System32\Sqlctr70.dll on the local node only.
Possible problem
The rebinding process can only be broken when something
is using one of the files it is trying to bind. If any SQL applications,
including the SQL Service Manager, are open this message displays:
..could not update binaries...
For more information, click the following article number to view the article in the Microsoft Knowledge Base:
248380
PRB: SQL 7.0 Failover Wizard error when updating binaries on cluster
The most common problem is that some of the system
files are in use.
You can usually determine which node the problem occurs on by the amount of time it takes to the message to display again after
a retry. If the message displays instantaneously, this usually indicates that a
file on the local computer is in use, but if it takes a few seconds, then the
problem is probably occurring on the other node.
Resolution
You can usually work around this problem by stopping
all offending services and make sure that you do not have any applications
open. To verify which services you should have running, refer to the following
articles in the Microsoft Knowledge Base:
192708 INF: Order of Installation for SQL Server 6.5 MSMQ 1.0 Clustering Setup
219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
Possible problem
If you are unclustering and one of the resource DLLs is
in use, the resource DLL may stop responding in one of its connections to the
server. This causes the resource monitor process (Resrcmon.exe) to have the
dbnmpntw.dll file open even when the resource is offline.
First resolution
Reboot and re-run the wizard to uninstall.
Second resolution
Rename the offending DLL to Dbnmpntw.dll.copy, and then
copy it back to the original name. Now the .copy file is in use but the
dbnmpntw.dll file is not, so the wizard may complete without any problems.
- The SQL Cluster Wizard now creates the net name, IP,
sqlserver, agent and vsrvsvc resources in the cluster, brings the SQL Server
resource online, and changes the local server in the sysservers system table to the virtual server name.
Possible problem
Creation of the resources is usually never a problem.
You should see the resources being created in the group in which the disk
resides. All this step does is create the resources and make dependencies
between them so that they can start in the correct order.
Bringing
the resources online is the last phase of the setup. The first phase is to
start the MSSQLSERVER$VIRTNAME service, connect to it, and set the values in sysservers correctly. If this step fails, then the whole setup fails and
rollbacks all the work it has done so far. When the rebinding of the
Sqlsrv32.dll (an ODBC file) file does not work correctly. When this occurs, you
will see error 123 or 126 in the cluster setup log (Sqlclstr.log) just after
the fixsysservers call.
If this situation occurs:- The cluster is completely broken.
- It is caused by the wizard only changing one of the two
references to the Kernel32.dll file to reference the Vernel32.dll file
instead.
- If you previously installed a different version of
Microsoft Data Access Components (MDAC) on the computer before installing SQL,
the version of the Sqlsrv32.dll file on the system is different.
First resolution
Reboot both servers and, before retrying, make sure
that only the minimum services are running as outlined in the following
Microsoft Knowledge Base articles:
192708 INF: Order of Installation for SQL Server 6.5 MSMQ 1.0 Clustering Setup
219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
Second resolution
Rename the Sqlsrv32.dll file, and then reboot the
computer. Before retrying, make sure that only the minimum services are running
as outlined in the following Microsoft Knowledge Base
articles:
192708 INF: Order of Installation for SQL Server 6.5 MSMQ 1.0 Clustering Setup
219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
Third resolution
Contact SQL Product Support Services. - The SQL Cluster Wizard finishes.
Specific scenarios
Scenario 1
Problem SQL Cluster Wizard fails with the following log entry:
@ CopyFileIfNeeded: [D:\EnterpriseEdition\x86\CLUSTER\SQAGTRES.DLL] => [C:\WINNT\System32\SQAGTRES.DLL]
@@@ CopyFileIfNeeded: [D:\EnterpriseEdition\x86\CLUSTER\SQAGTRES.DLL] => [\\LNXDAYCC02\admin$\system32\SQAGTRES.DLL]
~~~ XXX InstallRemote failed
[reghelp.cpp:34] : 2 (0x2): The system cannot find the file specified.
ResolutionVerify that you can make a \\server_name\admin$
connection from both nodes in the cluster.
Make sure you check this
if any network interface card (NIC) settings have been changed or if network
cards have been replaced.
Warning If you disable File and Print Sharing for Microsoft Networks,
under the Network Connection properties on Windows 2000 computers, you will not
be able to make a connection to the Administrative shares. Attempts to access
the Administrative shares causes a Error: 53 error message to occur.
Scenario 2
ProblemThe SQL Cluster Wizard fails with the following generic
message and there is not a reference to a specific file:
ResolutionVerify that the SQL group name is in all capital
letters. If it is not, the wizard tries to create a new group but is unable to
so. If it is not all uppercase, rename it to a temporary name (such as x) and
then rename it to the correct name in all uppercase.
Note This applies to renamed groups only. The default names like
"Disk Group 1" have their resources moved to the new group if required by SQL.
Scenario 3
ProblemThe Sqlclstr.log file shows the following:
~~~ ClusterResourceStart... tick=2, state=2
[validate.cpp:147] DeleteTestGroup:OpenClusterResource: 5007 (0x138f): The cluster resource could not be found.
~~~ XXX Copy Files failed
[reghelp.cpp:34] : 2 (0x2): The system cannot find the file specified.
ResolutionCheck the net shares on each node and look for the
following:
- \\cluster_tools_share
- \\cluster_setup_share
If either is found, delete them.
Scenario 4
ProblemWhen you try to re-cluster SQL after installing SQL
service pack 1, the install fails with the following error in the
Sqlcluster.log file:
Looking at disk P:
Disk P is fixed in group SQL_Disk
Looking at disk Q:
Disk Q is used by SQL but is moveable
Looking at disk R:
Error: Resource groups SQL_Disk and Disk_R both contain SQL disks
[chkconf.cpp:1416] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[chkconf.cpp:1482] ClusterFindVirtualSQLSrvGroup: 160 (0xa0): The argument string passed to DosExecPgm is not correct.
The "P" drive is the drive to which SQL was installed and the
installer thought it was the only drive in use. Actually, the P, Q and R drives
are being used.
ResolutionCheck the SQL error logs and
sysdevices system table and make sure that all drives being used by SQL are
in the SQL group SQL is using.
Note If additional cluster disk resources are added to the cluster
for use by SQL, or if other disks currently used in the cluster are designated
for use by the clustered SQL server, they should be added as dependencies of
the SQL Server.
Scenario 5
ProblemSetup is unable to update the remote node or errors
occur when connecting to all the default databases during the initial
setup.
For example:
#### SQL Server Remote Setup - Start Time 10/28/99 13:14:22 ####
Script file copied to '\\server8\ADMIN$\secnode.iss' successfully.
Installing remote service...
Running '\\node1\F$\ENGLISH\X86\setup\setupsql.exe SecNode=1 -s -f1 \\node2\ADMIN$\secnode.iss'...
Remote process exit code was '-1'.
\\node2\Admin$\sqlsp.log
Disconnecting from remote machine...
Service removed successfully.
Remote files removed successfully.
#### SQL Server Remote Setup - Stop Time 10/28/99 13:15:08 ####
ResolutionBe sure the service account is set up with all the
correct permissions. By copying the existing Administrator account, you can
make sure that the group memberships and many other properties are copied to
the new account. When a user account is copied, the description, group
memberships, logon hours, logon workstations, and account information are
copied exactly. The user name, full name, and password boxes of the new account
are blank and must be entered. The
User Cannot Change Password and
Password Never Expires check boxes are copied.
Note When copying an account that is a member of the Administrators
local group, the
User Cannot Change Password setting is not copied. Usually, the
User Must Change Password At Next Logon check box is selected, regardless of its setting in the original
account; however, this check box should be clear. Also, the
Password Never Expires check box should be selected. After all the entries are complete,
click
Add.
Now, from the
User Manager menu, select
Policies\User Rights, select to show
Advanced User Rights, and then grant the following rights to the new user:
- Act as part of the operating system.
- Logon as a service.
- Logon locally.
Next, logon to both nodes with the newly created account and
perform basic connectivity and rights testing:
- To verify remote procedure call (RPC) connectivity, try to
log on remotely from each node to the other with either Perfmon, Regedt32 or
Srvmgr.
- To verify NetBIOS, try issuing a net view
\\machine_name and net use
\\machine_name\admin$
- To verify RDR and SRV without NBT and IP connectivity
net view \\ IP Address
- Try using a telnet or FTP session to test for transport
functionality.
Scenario 6
ProblemThe SQL 6.5 Cluster Wizard fails and the last line of
cluster wizard log states:
Start SQL Server cConnectString="ODBC;DSN='';DRIVER={SQL Server};SERVER=CLIO;DATABASE=master;UID=sa;PWD="
ResolutionFirst verify that performing a
@@servername does not return a NULL response. If it
does, then the
sysservers system table does not have an entry for the local server name.
Correct this and continue.
If you were able to verify
@@servername, you should reload the ODBC drivers and then run the SQL Cluster
Wizard again. To reload the ODBC drivers, run the setup program from the SQL
Server 6.5 Extended Edition compact disk in either the \I386\Odbc directory for
Intel based computers or the \Alpha\Odbc directory for Alpha based computers.
Scenario 7
ProblemEvery time the Clustwiz.exe file runs, a Dr. Watson
message appears pointing to the Cpqmgmt.dbg file.
ResolutionAll the following Microsoft Knowledge Base references
indicate that this problem is related to the Compaq Insight Manager. Apply the
latest Compaq SoftPak (in most cases SSD 2.12a) and stop all possible
conflicting services as outlined in the following Microsoft Knowledge Base
articles:
192708 INF: Order of Installation for SQL Server 6.5 MSMQ 1.0 Clustering Setup
219264 INF: Order of Installation for SQL Server 7.0 Clustering Setup
Scenario 8
ProblemThe following registry entry is
incorrect:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVer\CommonFilesDir
ResolutionCorrect the path.
Scenario 9
ProblemYou are unable to uncluster SQL using the SQL Cluster
Failover Wizard.
ResolutionWhen the SQL Cluster Failover Wizard is run, the SQL
cluster resources are created. By default, these resources have the following
naming structure:
<Virtual_SQL_Server_Name> IP Address
<Virtual_SQL_Server_Name> Network Name
<Virtual_SQL_Server_Name> SQL Server 7.0
<Virtual_SQL_Server_Name> VServer
<Virtual_SQL_Server_Name> SQL Server Agent 7.0
For example, if the Virtual_SQL_Server_Name is xyz, the SQL
resources are, by default, named as:
xyz IP Address
xyz Network Name
xyz SQL Server 7.0
xyz VServer
xyz SQL Server Agent 7.0
If all or some of these resources are then modified to:
IP Address
Network Name
SQL Server
Virtual Server
SQL Agent
this can cause the SQL Cluster Failover Wizard to fail or hang
when used. To resolve this, rename the resources back to the default names.
Scenario 10
ProblemThe SQLCLUST.LOG shows the following:
~~~ OnEnableCluster: UpdateSku
~~~ OnEnableCluster: TransferSQLServices
+++ TransferSQLServices: enter
+++ TransferSQLServices: calling AddVSNameLanManServer
[reghelp.h:132] type not REG_MULTI_SZ: 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[reghelp.h:133] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[reghelp.h:290] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:1803] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:1836] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
[clenable.cpp:2379] : 160 (0xa0): The argument string passed to DosExecPgm is not correct.
~~~ XXX TransferSQLServices failed
ResolutionVerify that the type value of the following registry
key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters\NullSessionPipes
is REG_MULTI_SZ.
The actual failure is in
RegQueryValue_MULTI_SZ(). It fails because the type of the key is not
REG_MULTI_SZ.
It the type of the key is not REG_MULTI_SZ, you will
need to copy the contents from the key, delete and re-create the key with the
same name and correct type value, and then replace the contents.