The HYPGEOMDIST(sample_s, number_sample, population_s,
number_population) function returns the probability of exactly sample_s
successes in a sample of size number_sample that is drawn from a population of
size number_population that is known to contain exactly population_s successes.
Syntax
HYPGEOMDIST(sample_s, number_sample, population_s, number_population)
Note All four arguments are non-negative integers, sample_s is less
than or equal to population_s, population_s is less than number_population, and
number_sample is less than number_population.
Example Usage
There are fifty states in the U.S. and fifty state governors.
Assume twenty-two governors are Democrats and twenty-eight governors are
Republicans. If you create a committee of seven governors and you randomly
select the governors who sit on the committee, you can use the HYPGEOMDIST
function to answer the following questions:
- What is the chance that the committee contains exactly
three Democrats?
- What is the chance that the committee contains no more than
three Democrats?
To illustrate the HYPGEOMDIST function, follow these steps:
- Create a blank Excel worksheet, and then copy the following
table.
- In your blank Excel worksheet, click cell A1.
- Paste the entries so that the table fills cells A1:C9 in
your worksheet.
- After you paste the table into your new Excel worksheet,
click Paste Options, and then click Match Destination
Formatting.
- While the pasted range is selected, do one of the following:
- In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
- In Microsoft Office Excel 2003, point to Column on the
Format menu, and then click
AutoFit Selection.
Number of Democrats, d | P(committee of 7 contains
exactly d Democrats) | cumulative |
0 | =HYPGEOMDIST(A2, 7, 22,
50) | =SUM($B$2:B2) |
1 | =HYPGEOMDIST(A3, 7, 22,
50) | =SUM($B$2:B3) |
2 | =HYPGEOMDIST(A4, 7, 22,
50) | =SUM($B$2:B4) |
3 | =HYPGEOMDIST(A5, 7, 22,
50) | =SUM($B$2:B5) |
4 | =HYPGEOMDIST(A6, 7, 22,
50) | =SUM($B$2:B6) |
5 | =HYPGEOMDIST(A7, 7, 22,
50) | =SUM($B$2:B7) |
6 | =HYPGEOMDIST(A8, 7, 22,
50) | =SUM($B$2:B8) |
7 | =HYPGEOMDIST(A9, 7, 22,
50) | =SUM($B$2:B9) |
Cells B2:B9 show the probabilities of exactly d
Democrats on the committee for d = 0 to 7.
The HYPGEOMDIST Help file
provides a formula to compute HYPGEOMDIST in Excel 2003 and later versions of Excel (where COMBIN(n, k)
returns the number of combinations of size k in a population of size n):
COMBIN(population_s, sample_s) * COMBIN(number_population - population_s, number_sample - sample_s) / COMBIN(number_population, number_sample)
There is no cumulative version of HYPGEOMDIST. The cumulative
probabilities are in column C of the worksheet. Therefore, the answers to the
two previous questions are in cells B5 and C5, respectively.
Results in Earlier Versions of Excel
Knusel (see the "References" section) documented instances where
HYPGEOMDIST does not return a numeric answer and instead yields #NUM! because
of a numeric overflow. When HYPGEOMDIST returns numeric answers, the answers
are correct. Therefore, HYPGEOMDIST does not suffer from round off problems.
However, in some extreme cases, HYPGEOMDIST returns #NUM! when it should be
able to return an answer. The following functions also exhibit this behavior in
earlier versions of Excel:
- BINOMDIST
- CRITBINOM
- NEGBINOMDIST
- POISSON
For more information about the BINOMDIST, CRITBINOM, NEGBINOMDIST,
and POISSON functions, click the following article numbers to view the articles in the Microsoft Knowledge Base:
827459
Excel Statistical Functions:
BINOMDIST
828117 Excel Statistical Functions: CRITBINOM
828130 Excel Statistical Functions: POISSON
828361 Excel Statistical Functions: NEGBINOMDIST
#NUM! is returned only when COMBIN
overflows. COMBIN is called three times in the previous formula for
HYPGEOMDIST. This overflow occurs only if the first argument of COMBIN is
greater than or equal to 1,030. There are no computational problems as long as
this argument is less than 1,030. Because number_population is the largest
first argument to COMBIN, HYPGEOMDIST returns correct results if
number_population is less than 1,030.
Results in Excel 2003 and later versions of Excel
Because an overflow causes HYPGEOMDIST to return #NUM! and
HYPGEOMDIST is well-behaved when overflow does not occur, Microsoft has
implemented a conditional algorithm in Excel 2003 and later versions of Excel. The conditional algorithm
uses HYPGEOMDIST code from earlier versions of Excel (the computational formula
that involves COMBIN that is mentioned previously) when number_population is
less than 1,030. In cases where number_population is greater than or equal to
1,030, Microsoft has implemented an alternative plan. Pseudocode is provided in
the "Appendix" section in this article. The alternative plan for HYPGEOMDIST is
in the same spirit as the alternative plan for BINOMDIST, CRITBINOM, and
NEGBINOMDIST. The plan is used to avoid the evaluation of COMBIN with a first
argument that is greater than or equal to 1,030.
Conclusions
Inaccuracies in earlier versions of Excel occur only when the
HYPGEOMDIST formula results in a call to COMBIN with a first argument that is
greater than or equal to 1,030. In these cases, HYPGEOMDIST returns #NUM! in
earlier versions of Excel because COMBIN overflows.
The following
functions also exhibit similar behavior in earlier versions of Excel:
- BINOMDIST
- CRITBINOM
- NEGBINOMDIST
- POISSON
These functions also either return correct numeric results,
#NUM!, or #DIV/0!. These problems occur because of overflow or underflow
conditions. You can easily identify when these conditions occur. An alternative
algorithm in the same spirit as those for HYPGEOMDIST and BINOMDIST is
implemented in Excel 2003 and later versions of Excel to return correct answers in cases where earlier
versions return #NUM!.
Appendix
The following variables are used in the pseudocode:
- x = sample_s
- n = number_sample
- M = population_s
- N = number_population
If n is less than M, you can interchange the roles of the
arguments that call HYPGEOMDIST(x, M, n, N) instead of HYPGEOMDIST(x, n, M, N).
In the following pseudocode for HYPGEOMDIST(x, n, M, N), you can assume that n
is greater than or equal to M.
Step 0: Initialization. Initialize both
TotalUnscaledProbability and
UnscaledResult to 0. Initialize the constant
EssentiallyZero to a very small number, such as 10^(-12).
Step 1: Find
M*n/N and round down to the nearest whole number, m.
TotalUnscaledProbability = TotalUnscaledProbability + 1;
If (m == x) then UnscaledResult = UnscaledResult + 1;
Step 2: Calculate the unscaled probabilities for I is greater than m:
PreviousValue = 1;
Done = FALSE;
I = m + 1;
While (not Done && I <= M)
{
CurrentValue = PreviousValue * (n � I + 1) * (M � I + 1) /
(I * (N � n � M + I));
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (I == x) then UnscaledResult = UnscaledResult + CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
I = I+1;
}
end While;
Step 3: Calculate unscaled probabilities for I is less than m:
PreviousValue = 1;
Done = FALSE;
I = m - 1;
While (not Done && I >= 0)
{
CurrentValue = PreviousValue * (I + 1) * (N � n � M + I + 1) /
((n � I) * (M � I));
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (I == x) then UnscaledResult = UnscaledResult + CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
I = I-1;
}
end While;
Step 4: Combine the unscaled results:
Return UnscaledResult/TotalUnscaledProbability;