The CRITBINOM(n, p, alpha) function examines for various x the cumulative probability of x successes in n independent Bernoulli trials. Each trial has the associated probability p of success. CRITBINOM returns the smallest value of x for which this cumulative probability is greater than or equal to alpha. This cumulative probability is greater than or equal to
alpha.
Syntax
Parameter n is a positive integer, 0 < p < 1, and 0 <=
alpha <= 1 (although the cases where alpha = 0 and alpha = 1 are of limited
interest).
Example of usage
In baseball, you can make the simplifying supposition that a ".300
hitter" receives a hit (success) with probability 0.300 each time he comes to
bat (each trial), and that successive times at bat are independent Bernoulli
trials. In 100 trials, 30 successes is the batter's expected number of
hits.
You can use CRITBINOM to answer the following questions:
- The smallest number x of successes where the batter has at
least a 0.25 chance of getting x or fewer successes (x would be less than
30)
- (Small variation) The smallest number of successes x' where
the batter has at least a 0.25 chance of not getting at least x' successes (x'
would also be less than 30)
- The smallest number y of successes so that the chance of
getting y or more successes is less than 0.10 (y would be greater than
30)
To illustrate the CRITBINOM function, create a blank Excel
worksheet, and copy the following table. Then, select cell A1 in your blank Excel
worksheet and paste the entries so that the table fills cells A1:B21 in your
worksheet.
number of trials, n | 100 |
success probability, p | 0.3 |
alpha | |
0.05 | =CRITBINOM($B$1,$B$2,A4) |
0.1 | =CRITBINOM($B$1,$B$2,A5) |
0.2 | =CRITBINOM($B$1,$B$2,A6) |
0.25 | =CRITBINOM($B$1,$B$2,A7) |
0.3 | =CRITBINOM($B$1,$B$2,A8) |
0.4 | =CRITBINOM($B$1,$B$2,A9) |
0.5 | =CRITBINOM($B$1,$B$2,A10) |
0.6 | =CRITBINOM($B$1,$B$2,A11) |
0.7 | =CRITBINOM($B$1,$B$2,A12) |
0.75 | =CRITBINOM($B$1,$B$2,A13) |
0.8 | =CRITBINOM($B$1,$B$2,A14) |
0.9 | =CRITBINOM($B$1,$B$2,A15) |
0.95 | =CRITBINOM($B$1,$B$2,A16) |
| |
CRITBINOM | correct value |
=CRITBINOM(1030,0.5,0.16704) | 499 |
=CRITBINOM(1030,0.5,0.1831) | 500 |
=CRITBINOM(1030,0.5,0.51242) | 515 |
After you paste this table into your new Excel worksheet, click
the
Paste Options button, and then click
Match
Destination Formatting. With the pasted range still selected, use one of the following procedures, as appropriate for the version of Excel that you are running:
- In Microsoft Office Excel 2007, click the Home tab, click Format in the Cells group, and then click AutoFit Column Width.
- In Excel 2003, point to Column on the
Format menu, and then click
AutoFit Selection.
Cells B4:B16 show values of
CRITBINOM for respective alpha values in A4:A16. For question 1, CRITBINOM(100,
0.3, 0.25) in cell B7 is 27 so the probability of getting 27 or fewer successes
is greater than or equal to 0.25 whereas the probability of getting 26 or fewer
successes is less than 0.25. So, 27 is the smallest number of successes x such
that the batter has at least a 0.25 chance of getting x or fewer
successes.
Question 2 is just a small variation on question 1. Because
there is at least a 0.25 chance of 27 or fewer successes, there is at most a
0.75 chance of getting 28 or more successes, and therefore at least a 0.25
chance of not getting 28 or more successes.
For question 3,
CRITBINOM(100, 0.3, 0.9) is 36. So there is at least a 0.9 chance of getting 36
or fewer successes. Therefore, there is at most a 0.1 chance of getting 37 or
more successes. Also, 37 is the smallest number that satisfies this
condition.
Questions 2 and 3 illustrate the fact that many questions
about the cumulative binomial distribution require finding a CRITBINOM value.
CRITBINOM makes a statement like "the smallest x that gives a cumulative
probability associated with x that is greater than or equal to alpha is x =
27." This statement might then be used indirectly to answer the question. For
example, if P(number of successes <= 27) >= alpha, then P(number of
successes >= 28) < 1 � alpha.
Cells A18:B21 illustrate
performance of CRITBINOM. Correct values were found by using Knusel's software,
ELV. Knusel used this software as the basis for computations in his paper that is cited below. If
you have a version of Excel that is earlier than Excel 2003, the entries in A20
and A21 will be #NUM!. If you have Excel 2003 or a later version of Excel, the entries in A19:A21 and B19:B21
will agree.
Results in earlier versions of Excel
CRITBINOM depends on BINOMDIST. Knusel (see note 1) documented
instances where BINOMDIST does not return a numeric answer and yields #NUM!
instead because of a numeric overflow. When numeric answers are returned by
BINOMDIST, they are correct. #NUM! is returned only when the number of trials is
greater than or equal to 1030. There are no computational problems when n <
1030. The following article about BINOMDIST discusses this more thoroughly.
Note 1: Knusel, L.
On the Accuracy of Statistical Distributions in Microsoft Excel 97, Computational Statistics and Data Analysis, 26, 375-377,
1998.
For more information about BINOMDIST, click the following article number to view the article in the Microsoft Knowledge Base:
827459
Excel statistical functions: BINOMDIST
Knusel did not report on CRITBINOM in his paper. However,
as expected, CRITBINOM gives the same problems, but again only when n >=
1030.
Results in Excel 2003 and in later versions of Excel
The approach to improvements in Excel 2003 and in later versions of Excel is exactly the same as
with BINOMDIST: use existing pre-Excel 2003 code if n < 1030 and switch to
an alternative plan if n >= 1030. The remainder of the discussion in this
section deals with only the case where n >= 1030. The alternative plan is
built in the same way as for BINOMDIST: find the modal value m (m =
approximately n*p), assign an unscaled probability of 1 to m, find unscaled
probabilities of m+1, m+2, m+3, ... stopping when such probabilities become
infinitesimal, find unscaled probabilities of m-1, m-2, m-3, ... stopping when
such probabilities become infinitesimal. Finally, scale the appropriate
probabilities.
CRITBINOM is a kind of inverse function for a discrete
distribution that is similar to NORMSINV for the continuous standard normal
distribution. NORMSINV is computed through a search process that frequently
calls NORMSDIST as it homes in on the result. Because BINOMDIST is
computationally expensive, you want to avoid a similar process for CRITBINOM
that would repeatedly call BINOMDIST. The procedure establishes an initial
guess, executes the code below (similar to a single call to BINOMDIST), and
then adjusts the guess to arrive at a final answer.
First,
find a guess by using a normal approximation to the binomial distribution. You
can assume a normal distribution with the same mean and standard deviation as
the binomial, namely, a mean n*p and a standard deviation SQRT(n*p*(1-p)). This
approximation should be reasonably accurate as long as n*p*(1-p) > 30. This will be the case with n >= 1030 unless p is very close to 0 or very
close to 1. Because you require only an approximate value, use a quick
approximation to NORMSINV instead of calling NORMSINV itself. The approximation
comes from 26.2.23 in Abramowitz, M. and I. A. Stegun,
Handbook of Mathematical Functions, Dover, 1972, p. 933.
If alpha <= 0.5, define t =
SQRT(LN(1/(alpha^2))) then NORMSINV(alpha) is approximated by NApprox = -t +
(2.515517 + 0.802853*t + 0.010328*t*t)/(1 + 1.432788*t + 0.189269*t*t +
0.001308*t*t*t)
If alpha > 0.5, define t = SQRT(LN(1/((1 �
alpha)^2))) then NORMSINV(alpha) is approximated by NApprox = t - (2.515517 +
0.802853*t + 0.010328*t*t)/(1 + 1.432788*t + 0.189269*t*t +
0.001308*t*t*t)
The initial guessed value of CRITBINOM is as follows.
Guess = FLOOR(n*p + NApprox*SQRT(n*p*(1-p)))
You have to make sure that this is not outside the range [0,n]. Therefore, run the following code:
If (Guess < 0) then Guess = 0;
If (Guess > n) then Guess = n;
The pseudocode that follows is adapted from the pseudocode for BINOMDIST (in the
article on BINOMDIST) to find both the cumulative and non-cumulative
probabilities for Guess. In other words, when this pseudocode terminates, you
will have computed BINOMDIST(Guess, n, p, TRUE) and BINOMDIST(Guess, n, p,
FALSE).
Step 0: (Initialization). Initialize TotalUnscaledProbability,
UnscaledPGuess (for non-cumulative probability), and UnscaledCumPGuess (for
cumulative probability) all to 0. Initialize the constant EssentiallyZero to a
very small number, such as 10^(-12).
Step 1: find n*p and round down
to the nearest whole number, m. The most likely number of successes in n trials
is either m or m+1.
TotalUnscaledProbability = TotalUnscaledProbability + 1;
If (m == Guess) then UnscaledPGuess = UnscaledPGuess + 1;
If (m <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess + 1;
Step 2: (Calculate unscaled probabilities for k > m):
PreviousValue = 1;
Done = FALSE;
k = m + 1;
While (not Done && k <= n)
{
CurrentValue = PreviousValue * (n � k + 1) * p / (k * (1 � p));
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (k == Guess) then UnscaledPGuess = UnscaledPGuess + CurrentValue;
If (k <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess +
CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
k = k+1;
}
end While;
Step 3: (Calculate unscaled probabilities for k < m):
PreviousValue = 1;
Done = FALSE;
k = m - 1;
While (not Done && k >= 0)
{
CurrentValue = PreviousValue * k+1 * (1-p) / ((n � k) * p);
TotalUnscaledProbability = TotalUnscaledProbability + CurrentValue;
If (k == Guess) then UnscaledPGuess = UnscaledPGuess + CurrentValue;
If (k <= Guess) then UnscaledCumPGuess = UnscaledCumPGuess +
CurrentValue;
If (CurrentValue <= EssentiallyZero) then Done = TRUE;
PreviousValue = CurrentValue;
k = k-1;
}
end While;
Step 4: (Combine unscaled results):
PGuess = UnscaledPGuess/TotalUnscaledProbability;
CumPGuess = UnscaledCumPGuess/TotalUnscaledProbability;
Step 5: Adjust the result. At this point you have values
for Guess, PGuess, and CumPGuess. Let CumPGuessMinus1 be the cumulative
probability of Guess � 1. Then Guess is the correct answer if CumPGuessMinus1
< alpha but CumPGuess >= alpha. If both of these are < alpha, increase Guess by 1. If both of these are >= alpha, decrease Guess by
1. This is a simple process. First
CumPGuessMinus1 = CumPGuess � PGuess;
If you have to increase Guess then
PGuessPlus1 = PGuess * (n � Guess) * p / Guess / (1 � p);
CumPGuessMinus1 = CumPGuess;
CumPGuess = CumPGuess + PGuessPlus1;
PGuess = PGuessPlus1;
Guess = Guess + 1;
If you have to decrease Guess then
PGuessMinus1 = PGuess * Guess * (1 � p) / (n � Guess + 1) / p;
CumPGuess = CumPGuessMinus1;
CumPGuessMinus1 = CumPGuessMinus1 � PGuess;
PGuess = PGuessMinus1;
Guess = Guess � 1;
You might have to increase Guess more than one time or you might
have to decrease Guess more than one time, but you would never have to both
increase Guess at least one time and decrease Guess at least one
time.
CRITBINOM has been thoroughly tested for accuracy. However, only
casual anecdotal testing has been done to investigate how close the initial
Guess is to the correct answer and how many times the Guess has to be increased
or decreased. The normal approximation generally provides an excellent value of
Guess; in our limited casual tests, we never had to increase or decrease the
initial Guess by more than 2.
Conclusions
Inaccuracies in earlier versions of Excel occur only when the
number of trials is greater than or equal to 1030. In such cases, CRITBINOM
returns #NUM! in earlier versions of Excel. This issue occurs because one term
in a sequence of terms to be multiplied together when you evaluate BINOMDIST
overflows. This issue has been corrected in Excel 2003 and in later versions of Excel by implementing an
alternative procedure when such an overflow would
otherwise occur. This alternative procedure was described earlier in this article.
Five functions that exhibit similar behavior in
earlier versions of Excel are BINOMDIST, CRITBINOM, HYPGEOMDIST, NEGBINOMDIST,
and POISSON. These functions always return correct numeric results or #NUM! or
#DIV/0!. Again, problems occur because of overflow (or underflow).
The
conditions under which these problems will occur are easily identified, and an
alternative plan algorithm that is in the same spirit as the algorithm for BINOMDIST (or
CRITBINOM) is implemented in Excel 2003 and in later versions of Excel. This algorithm is implemented to return correct answers in cases
where earlier versions of Excel return #NUM! or #DIV/0!.