Notice: This website is an unofficial Microsoft Knowledge Base (hereinafter KB) archive and is intended to provide a reliable access to deleted content from Microsoft KB. All KB articles are owned by Microsoft Corporation. Read full disclaimer for more details.

Excel Statistical Functions: HYPGEOMDIST


View products that this article applies to.

Summary

The purpose of this article is to describe the HYPGEOMDIST function in Microsoft Office Excel 2003 and later, to illustrate how the function is used, and to compare the results of the HYPGEOMDIST function for Excel 2003 and later with the results of the HYPGEOMDIST function in earlier versions of Excel.

Microsoft Excel 2004 for Macintosh Information

The statistical functions in Excel 2004 for Macintosh were updated using the same algorithms as Microsoft Office Excel 2003 and later versions of Excel. Any information in this article that describes how a function works or how a function was modified for Excel 2003 and later versions of Excel also applies to Excel 2004 for Macintosh.

↑ Back to the top


More information

The HYPGEOMDIST(sample_s, number_sample, population_s, number_population) function returns the probability of exactly sample_s successes in a sample of size number_sample that is drawn from a population of size number_population that is known to contain exactly population_s successes.

Syntax

HYPGEOMDIST(sample_s, number_sample, population_s, number_population)
Note All four arguments are non-negative integers, sample_s is less than or equal to population_s, population_s is less than number_population, and number_sample is less than number_population.

Example Usage

There are fifty states in the U.S. and fifty state governors. Assume twenty-two governors are Democrats and twenty-eight governors are Republicans. If you create a committee of seven governors and you randomly select the governors who sit on the committee, you can use the HYPGEOMDIST function to answer the following questions:
  • What is the chance that the committee contains exactly three Democrats?
  • What is the chance that the committee contains no more than three Democrats?
To illustrate the HYPGEOMDIST function, follow these steps:
  1. Create a blank Excel worksheet, and then copy the following table.
  2. In your blank Excel worksheet, click cell A1.
  3. Paste the entries so that the table fills cells A1:C9 in your worksheet.
  4. After you paste the table into your new Excel worksheet, click Paste Options, and then click Match Destination Formatting.
  5. While the pasted range is selected, do one of the following:
    • In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
    • In Microsoft Office Excel 2003, point to Column on the Format menu, and then click AutoFit Selection.
Number of Democrats, dP(committee of 7 contains exactly d Democrats)cumulative
0=HYPGEOMDIST(A2, 7, 22, 50)=SUM($B$2:B2)
1=HYPGEOMDIST(A3, 7, 22, 50)=SUM($B$2:B3)
2=HYPGEOMDIST(A4, 7, 22, 50)=SUM($B$2:B4)
3=HYPGEOMDIST(A5, 7, 22, 50)=SUM($B$2:B5)
4=HYPGEOMDIST(A6, 7, 22, 50)=SUM($B$2:B6)
5=HYPGEOMDIST(A7, 7, 22, 50)=SUM($B$2:B7)
6=HYPGEOMDIST(A8, 7, 22, 50)=SUM($B$2:B8)
7=HYPGEOMDIST(A9, 7, 22, 50)=SUM($B$2:B9)


Cells B2:B9 show the probabilities of exactly d Democrats on the committee for d = 0 to 7.

The HYPGEOMDIST Help file provides a formula to compute HYPGEOMDIST in Excel 2003 and later versions of Excel (where COMBIN(n, k) returns the number of combinations of size k in a population of size n):
COMBIN(population_s, sample_s) * COMBIN(number_population - population_s, number_sample - sample_s) / COMBIN(number_population, number_sample)
There is no cumulative version of HYPGEOMDIST. The cumulative probabilities are in column C of the worksheet. Therefore, the answers to the two previous questions are in cells B5 and C5, respectively.

Results in Earlier Versions of Excel

Knusel (see the "References" section) documented instances where HYPGEOMDIST does not return a numeric answer and instead yields #NUM! because of a numeric overflow. When HYPGEOMDIST returns numeric answers, the answers are correct. Therefore, HYPGEOMDIST does not suffer from round off problems. However, in some extreme cases, HYPGEOMDIST returns #NUM! when it should be able to return an answer. The following functions also exhibit this behavior in earlier versions of Excel:
  • BINOMDIST
  • CRITBINOM
  • NEGBINOMDIST
  • POISSON
For more information about the BINOMDIST, CRITBINOM, NEGBINOMDIST, and POISSON functions, click the following article numbers to view the articles in the Microsoft Knowledge Base:
827459 Excel Statistical Functions: BINOMDIST
828117 Excel Statistical Functions: CRITBINOM
828130 Excel Statistical Functions: POISSON
828361 Excel Statistical Functions: NEGBINOMDIST


#NUM! is returned only when COMBIN overflows. COMBIN is called three times in the previous formula for HYPGEOMDIST. This overflow occurs only if the first argument of COMBIN is greater than or equal to 1,030. There are no computational problems as long as this argument is less than 1,030. Because number_population is the largest first argument to COMBIN, HYPGEOMDIST returns correct results if number_population is less than 1,030.

Results in Excel 2003 and later versions of Excel

Because an overflow causes HYPGEOMDIST to return #NUM! and HYPGEOMDIST is well-behaved when overflow does not occur, Microsoft has implemented a conditional algorithm in Excel 2003 and later versions of Excel. The conditional algorithm uses HYPGEOMDIST code from earlier versions of Excel (the computational formula that involves COMBIN that is mentioned previously) when number_population is less than 1,030. In cases where number_population is greater than or equal to 1,030, Microsoft has implemented an alternative plan. Pseudocode is provided in the "Appendix" section in this article. The alternative plan for HYPGEOMDIST is in the same spirit as the alternative plan for BINOMDIST, CRITBINOM, and NEGBINOMDIST. The plan is used to avoid the evaluation of COMBIN with a first argument that is greater than or equal to 1,030.

Conclusions

Inaccuracies in earlier versions of Excel occur only when the HYPGEOMDIST formula results in a call to COMBIN with a first argument that is greater than or equal to 1,030. In these cases, HYPGEOMDIST returns #NUM! in earlier versions of Excel because COMBIN overflows.

The following functions also exhibit similar behavior in earlier versions of Excel:
  • BINOMDIST
  • CRITBINOM
  • NEGBINOMDIST
  • POISSON
These functions also either return correct numeric results, #NUM!, or #DIV/0!. These problems occur because of overflow or underflow conditions. You can easily identify when these conditions occur. An alternative algorithm in the same spirit as those for HYPGEOMDIST and BINOMDIST is implemented in Excel 2003 and later versions of Excel to return correct answers in cases where earlier versions return #NUM!.

Appendix

The following variables are used in the pseudocode:
  • x = sample_s
  • n = number_sample
  • M = population_s
  • N = number_population
If n is less than M, you can interchange the roles of the arguments that call HYPGEOMDIST(x, M, n, N) instead of HYPGEOMDIST(x, n, M, N). In the following pseudocode for HYPGEOMDIST(x, n, M, N), you can assume that n is greater than or equal to M.

Step 0: Initialization. Initialize both TotalUnscaledProbability and UnscaledResult to 0. Initialize the constant EssentiallyZero to a very small number, such as 10^(-12).

Step 1: Find M*n/N and round down to the nearest whole number, m.
TotalUnscaledProbability = TotalUnscaledProbability + 1;
	If (m == x) then UnscaledResult = UnscaledResult + 1;
Step 2: Calculate the unscaled probabilities for I is greater than m:
PreviousValue = 1;
Done = FALSE;
I = m + 1;
While (not Done && I <= M)
	 {
	CurrentValue = PreviousValue * (n � I + 1) * (M � I + 1) /
						(I * (N � n � M + I));
	TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
	If (I == x) then UnscaledResult = UnscaledResult + CurrentValue;
	If (CurrentValue <= EssentiallyZero) then Done = TRUE;
	PreviousValue = CurrentValue;
	I = I+1;
	 }
end While;
Step 3: Calculate unscaled probabilities for I is less than m:
PreviousValue = 1;
Done = FALSE;
I = m - 1;
While (not Done && I >= 0)
	 {
	CurrentValue = PreviousValue * (I + 1) * (N � n � M + I + 1) /
							((n � I) * (M � I));
	TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
	If (I == x) then UnscaledResult = UnscaledResult + CurrentValue;
	If (CurrentValue <= EssentiallyZero) then Done = TRUE;
	PreviousValue = CurrentValue;
	I = I-1;
	 }
end While;
Step 4: Combine the unscaled results:
Return UnscaledResult/TotalUnscaledProbability;

↑ Back to the top


References

Knusel, L. "On the Accuracy of Statistical Distributions in Microsoft Excel 97." Computational Statistics and Data Analysis, 1998, 26, 375-377.

↑ Back to the top


Keywords: KB828515, kbinfo, kbexpertisebeginner

↑ Back to the top

Article Info
Article ID : 828515
Revision : 5
Created on : 1/18/2007
Published on : 1/18/2007
Exists online : False
Views : 443