The ZTEST(array, �0, sigma) function is used for hypothesis
testing when observations in the sample are assumed to come from a normal
distribution with a known standard deviation, sigma. If the null hypothesis is
such that the mean of this normal distribution is �0, then for a sample of size
n, under this null hypothesis, the sample mean has a normal distribution with
mean �0 and standard deviation sigma/SQRT(n). ZTEST returns the probability
that a typical random variable with this distribution will take on a value
higher than the observed sample mean corresponding to the sample in
array.
If sigma is omitted, then ZTEST will use STDEV(array) in its
place.
Syntax
Note The array contains numeric data, �0 is a real number and sigma
(if included) is a positive number. If sigma is omitted, STDEV(array) is used.
In this case, the data in the array must be such that STDEV(array) is positive.
This is a reasonably innocuous supposition that is satisfied unless each entry
in the array has the same value.
Example of usage
To test how the ZTEST function works, assume that intelligence
quotient (IQ) scores follow a normal distribution with a standard deviation of
15, and that you review the IQs for a sample of nine students in your local
school.
To illustrate this, create a blank Excel worksheet, and copy the following
table. Select cell A1 in your blank Excel worksheet, and then
paste the
entries so that the table fills cells A1:D20 in your worksheet.
array | mu0 | sigma | |
110 | 100 | 15 | |
115 | | | |
120 | | | |
95 | | | |
110 | | | |
105 | | | |
90 | | | |
105 | | | |
125 | | | |
| | | |
for one-sided test: | | | for
two-sided test |
=AVERAGE(A2:A10) | | | |
=ZTEST(A2:A10,B2,C2) | | | =2*MIN(ZTEST(A2:A10,B2,C2),1
- ZTEST(A2:A10,B2,C2)) |
=1 - NORMSDIST((AVERAGE(A2:A10) -
B2)/(C2/SQRT(COUNT(A2:A10)))) | | | |
| | | |
with sigma assumed
unknown: | | | |
=STDEV(A2:A10) | | | |
=ZTEST(A2:A10,B2) | | | =2*MIN(ZTEST(A2:A10,B2),
1 - ZTEST(A2:A10,B2)) |
=1 - NORMSDIST((AVERAGE(A2:A10) -
B2)/(STDEV(A2:A10)/SQRT(COUNT(A2:A10)))) | | | |
After you paste the table in your new Excel worksheet,
click the
Paste Options button, and then click
Match
Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:
- In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
- In Excel 2003, point to
Column on the Format menu, and then click
AutoFit Selection.
The IQ data for the nine students are given in A2:A10, �0 is in
B2, and sigma is in C2. First, assuming that you know (and will therefore use)
the value of sigma, the value returned by ZTEST for the one-sided (or
one-tailed) test is in cell A14. The average IQ value, shown in A13 is 108.333.
ZTEST confirms that if the true mean of the underlying normal distribution from
which all nine student IQs were drawn is 100 and the true standard deviation is
15, a sample mean that is higher than 108.333 would occur with probability 0.0478.
Therefore, if before drawing the sample you had established a significance
level of 0.05 for this one-tailed test, we would reject the null hypothesis in
favor of the one-sided alternative that the true mean of the underlying
distribution from which our sample was drawn is greater than
100.
Cell A15 shows how ZTEST is computed when you assume that sigma
is known. If you call ZTEST(array, �0, sigma), ZTEST returns
1 - NORMSDIST((AVERAGE(array) - �0)/(sigma/SQRT(n)))
where
n is the number of observations
in the sample.
Cells A17:A20 provide results for the case where sigma
is assumed to be not known (and the optional third argument to ZTEST is
omitted). In this case, Excel calculates STDEV(array) shown in A18, and it uses
this quantity instead of sigma. Cell A19 shows the results of ZTEST with the
third argument omitted. Notice that the probability under the null hypothesis
of a higher value of the sample mean is about one fourth of the value in the
case where sigma is known. This occurs because the sample standard deviation
was 11.18, lower than 15. Dividing these quantities by SQRT(sample size) =
SQRT(9) = 3 gives 3.73 and 5, respectively. The observed sample mean is 8.333
units above the hypothesized �0 = 100 and 8.333/3.73 is about 2.23, while
8.333/5 is 1.67. The probability that a standard normal random variable exceeds
2.23 is about one-fourth the probability that it exceeds 1.67.
ZTEST
is designed to give a one-tailed result, the probability that a sample mean
higher than observed would occur under the null hypothesis. ZTEST can be
adapted to give a two-tailed result, the probability that a sample mean further
from �0 in either direction than the observed sample mean would occur under the
null hypothesis. To answer this question, you can use the following formula
involving ZTEST when sigma is assumed known:
= 2 * MIN(ZTEST(array, �0 , sigma); 1 - ZTEST(array, �0 , sigma))
You can still do a two-tailed test when sigma is assumed unknown
by using:
= 2 * MIN(ZTEST(array, �0); 1 - ZTEST(array, �0))
Cells D14 and D19 show these two results.
Results in earlier versions of Excel
Code for ZTEST was not changed for Excel 2003 and for later versions of Excel. However, because
ZTEST effectively calls NORMSDIST when it computes its result, the accuracy of
ZTEST relies primarily on the accuracy of NORMSDIST.
The accuracy of NORMSDIST
has been improved in Excel 2003 and in later versions of Excel. In earlier versions of Excel, a single
computational procedure was used for all values of z. Results were essentially
accurate to seven decimal places. This is more than sufficient for most practical
examples.
For more information about NORMSDIST, click the following article number to view the article in the Microsoft Knowledge Base:
827369
Excel statistical functions: NORMSDIST
Results in Excel 2003 and in later versions of Excel
The procedure for NORMSDIST in Excel 2003 and in later versions of Excel uses two different
computational procedures, depending on the value of z. The first is for z
between -5 and +5. The second is for z values in the extreme left or right
tails, below -5 or above +5. Accuracy was improved for all values because, over
the range of z values where each was used, these two methods were both superior
to the single method used in previous versions of Excel. Typical accuracy is
now 14 to 15 decimal places.
Conclusions
There are rare occasions in which you might demand accuracy in ZTEST
that is better than seven decimal places. On such occasions, the version of
NORMSDIST in Excel 2003 and in later versions of Excel will cause ZTEST to give superior performance. For all other
computations involving ZTEST, you should not notice a difference between later versions of Excel and earlier versions of Excel.
When you conduct a hypothesis
test, you should establish a significance level in advance of the test. Typical
significance levels are 0.05, 0.01, or 0.001. After the data is gathered, you
can run ZTEST to determine whether the null hypothesis should be rejected
because the value of ZTEST is below the significance level cutoff.
It is hard
to imagine that more than seven decimal places of accuracy in ZTEST would be
required for this determination unless you started with an extremely small
significance level, such as 0.00000001.