Dionysus.biz flaming image

St. Paul's Statistics Introduction: Chapter 2

Probability Density Function Shapes and Significance

Section Goals:

  1. Familiarize students with the shape of several Probability Density Function (PDF) graphs.
  2. Help students grasp the significance of PDF shape.
  3. Make students aware of features common to all PDF graphs.

There are many statistical distributions . In this chapter, we will examine three industrially important distributions; but, our first example is purely educational and you have already seen it. It is the 2 dice PDF example from Chapter 1. Each of the industrially important distributions has at least one Probability Density Function (PDF) similar to Ch 1 Figure 1. Some distributions have an entire family of similar looking PDFs. In this chapter, we will examine the following PDFs of four Statistical Distributions.

Please remember that a PDF is a pictorial representation of a statistical distribution and provides clues on how to get useful answers. A PDF is thus a kind of map. Actual answers are cranked out using math methods presented later (chapters 4,5 &6), but understanding the PDF will help you do the math and understand the answers.

Characteristics of a PDF:
A Probability Density Function (or PDF) is a graph that describes the likelihood of different outcomes for a chance event. Consider flipping 20 coins. You would expect about 10 heads (and 10 tails). But you would not be surprised if the result were 12 heads on Tuesday or 9 on Friday. The PDF for coin flipping describes the likelihood of both these results AND it also shows a very small likelihood (almost zero) of getting 2 or 3 heads etc.. The PDF is presented below:

probabilities of different results from throw of 20 coins
Ch 2 Figure 1
Probabilities of 'Heads' Out of 20 Coin Trials

The coin flipping PDF shows all possible outcomes (which of course totals 100%). For a PDF, the area under the curve represents probability (don't look at the Y value). From Ch 2 Figure 1, we can surmise that the probability of getting 9, 10 or 11 heads is 49.6% (add the three center bar areas). The probability of getting 0 or 1 heads is remote (almost zero percent).

PDFs for all distributions are interpreted in similar manner. Please note that on any given day, the observed results of a random event may be different; but, on most days, you will observe results where the large areas indicate high probability of occurence.

Homework: Ch 2 Problem 1:
Part a: Using the the PDF Ch2 Fig 1, what is the probability of getting 8, 9, 10 11 or 12 heads?
Part b: What about 10 or less heads?
Part c: If you throw 20 coins, what is the probability that 'something' will happen?
Part d: If you used only 1 coin, stuffed it in a box and shook it around before throwing it. And you repeated it 20 times, should you treat the total result as a single throw of 20 coins? Would that kind of process also obey the PDF Ch 2 Figure 1?

The Binomal Distribution PDF

The binomial distribution is widely used by industry to solve many different kinds of problems. The following are real industrial problems similar to ones I dealt with as an engineer that can be solved via the binomial distribution. I have also included examples showing how statistics can be used to gain historical insight or to facilitate computer game modeling.

Criteria for Using the Binomial Distribution
When the following four conditions are ALL true, the Binomial Distribution can be confidently used.

Homework: Ch 2 Problem 2: Using the four criteria above:

  1. Decide whether or not the binomial distribution can be used to analyze the toss of 10 coins that was discussed above (yes or no). Look at Ch 1 Fig 1 which statistically describes this situation.
  2. Write a brief list of steps that justifies your answer.

  3. Compare your analysis to St. Paul's. Click here for Paul's solution.

Homework: Ch 2 Problem 3: Using the four criteria above:

Homework problem 3 is a very realistic business problem. If you take out a loan on your farm, you want to be very confident you will be able to make your payments. Assuming you know the percentage of 'sick/old cows' that produce bad milk, a very realistic evaluation can be made using the binomial analysis just discussed.

How Many PDFs does the Binomial Have?
The following paragraph will begin to familiarize the student with the trends of the binomial distribution. Please read the material, try to make sense of it and move on. Don't waste time memorizing the details of this figure. Just try to understand the general idea.

The Binomial is actually a family of PDFs. All 8 PDFs below (see Ch2 Figure 2) are members of the Binomial Statistical Distribution. A binomial is mathematically generated, based on the probability p, and the number of samples n. Thus, each different value of the n,p number pair will generate a different PDF; but, all the PDFs do look rather similar as shown below:

effect of different mean and standard deviation on binomial distribution probabilities
Binomial PDFs - Variation of Parameters n,p
Ch 2 Figure 2

The top row of PDF graphs all represent 4 objects, or four trials of the same object (see Homework Ch2 part d). The bottom row represents 8 objects (or 8 trials of the same object). As we move from left to right, we move from small probabilities of success to larger probabilities. For example, at the left, the PDF shows the probabilities of 4 cannon shells each with a 20% chance of hitting the target (n=4 and p=20%). At the right of the top row, we are modeling far more skilled gunners. For these men of skill, each cannon shell's chance of hitting is increased to 65% (n=4 and p=65%).

The top row of the figure above shows how the PDF changes when n=4 and p (probability of success) varies from 20% to 65%. The second row presents the same data for n=8 (8 cannon shells) and p varying through the same range. The following general observations can be made:

  1. For the binomial family of PDFs, the general shape is similar. The PDF in all cases is much like the cross section of a bell; but, it is sometimes distorted.
  2. The binomial above is discrete -- i.e. the x axis represents only counting numbers (0,1,2,3 ...) and no fractions.
  3. When n is larger, the PDF remains the same general shape but has more bars and becomes smoother.
  4. The value of p, determines the distortion of the "bell" shape. p determines if the PDF is skewed left, right or is symmetric.

The Gaussian Distribution PDF

the Gaussian distribution is the most widely used statistical distribution in existance. It is typically the first statistical analysis attempted by engineers and scientists. When it doesn't work satisfactorily, they look at other distributions. The Gaussian distribution is elegant and has well defined methods that will solve a variety of problems.

Very often, the Gaussian is simply "presumed" to be appropriate and is applied to data. Then, engineers and scientists do a reasonableness check before proceeding further. Below are three examples of phonomina where the Gaussian distribution applies and two examples where it does not work well.

We just looked at some populations which are Gaussian and some that are not. Below are two clearly stated examples of industrial problems that can be solved using the Gaussian Distribution and methods that will be presented in Chapter 4. Chapter 4 will also explain the technical terms mean μ and "standard deviation" (sigma). Chapter 4 will show you how to compute the mean μ and standard deviation (sigma) for a sample of parts coming off a production line.

Comment: The mean is another way of saying the average value. The standard deviation will be explained in chapter 4. The pair of values, (mean μ, standard deviation σ) are the key defining the Gaussian distribution for a particular product (i.e. for a population of produced items such as bolts, ink pens or fruit cakes).

How Many PDFs does the Gaussian Have?
The following paragraph will begin to familiarize the student with the behavior of the Gaussian distribution. Please read the material, try to make sense of it. The student should read and re-read the material until the general trends of mean and standard deviation are committed to memory. You want to concentrate on understanding the trends. Do study the fact that increases in mean (μ) from 25 to 50 to 75 slide the "bell shaped curve" to the right. Do study the fact that in the second row, increasing the Standard Deviation from 2.5 to 5 to 7 stretches the curve in the horozontal.

The Gaussian Distribution is actually a family of PDFs. All six of the curves (Fig 3 below) are Gaussian. A particular Gaussian PDF is mathematically generated, based on the average value (mean μ), and the standard deviation σ (which will be mathematically explain in Chapter 4). Thus, each unique pair of parameters (mean σ, StdDev μ) can be used to generate a corresponding Gaussian PDF; and yet, all the PDFs do look rather similar as shown below and handled by the same math:

effect of different mean and standard deviation of Gaussian distribution
Ch 2 Figure 3
Effect of Changing Mean & Standard Deviation of a Gauss PDF

The Gaussian Distribution has a family of PDF curves, each being defined by its mean and standard deviation. The figure above (Ch 2 Figure 3) provides the student with a clear illustration of how the PDF changes as the mean increases. Basically, the Gauss PDF shifts off to the right so its center aligns with larger and larger values of the mean. If the mean value is increased, the entire PDF slides to the right and all the bolts have higher strengths. This concept proves very useful because on Tuesday our bolt factory may be making 36 KSI strength foundation bolts; but, on Friday we may be making high strength 150 KSI aircraft bolts. We can use the Gauss PDF model for both by simply adjusting the value of the mean!

The Gaussian Distribution has a second parameter- the "so called" standard deviation (σ) which we will study in Chapter 4. Looking at the lower part of Ch 2 Figure 3, we can see the effect of varying the Standard Deviation. The lower left graph has a standard deviation of 2.5, and is narrow. The total area under the curve is still 100%, but all the area is fairly close to the central mean (μ) value. In practice, this means that when such bolts are sent to build foundations, the strength values of all will be pretty close to the mean value (because all the PDF area is close to the mean value).

Examining the lower middle graph, the area is still 100%, but the graph is wider and shorter because the Standard Deviation has increased. This means that bolts built with this kind of Gaussian PDF, will put bolts with considerable strength variation into buildings and airplanes they are intended for (not so good). The lower right graph is wider still indicating large variation in properties of the bolt or other product it describes. In general, industrial processes with a large standard deviation are not a good thing.

We will study the significance of varying mean and/or standard deviation in Chapter 4; but, for now, the student is to understand that the Gaussian PDF has the ability to describe items with high values of strength, ductility, or penguin population. The Gaussian PDF is also capable of describing situations where there is little to much variation in the observed quality be it strength or Penguin population.

OPTIONAL: Some might wonder "Why can't I just make it the way I want and to heck with the mean and standard deviation!

Observations about the Gaussian PDF

The Weibull Distribution PDF

the Weibull distribution is a general purpose Probability Distribution Function that can be applied to a wide range of problems that includes:

The Weibull Distribution was discovered in 1927 by Frecht but was not widely known until 1951 when Swedish Engineer Walodi Weibull wrote a short paper explaining its usefulness. Like the Binomial and the Gaussian distributions, the Weibull distribution has well established methods of use. It is a bit more difficult to use than either the Binomial or the Gaussian distributions; but, is extremely powerful because of the wide range of problems it can solve.

Like the Gaussian Distribution, the Weibull PDF is generated by an equation. The x axis is continuous and as a result, areas under the PDF are equivalent to probability. The Y axis PDF values have little or no meaning.

How Many PDFs does the Weibull Distribution Have?

The Weibull has a family of PDFs, just like the Gaussian. The Weibull is defined by an equation, and there are two common forms:
  1. Two parameter Weibull that has slope parameter Beta β and (Characteristic Life) parameter eta η
  2. Three parameter Weibull that has slope parameter Beta β, characteristic life parameter eta η and initial event time tzero
This course will present the two parameter Weibull. The 3 parameter Weibull is similar but with a few added complications.

What Does the two parameter Weibull PDF Look Like? Ch2 Fig 4 below presents 4 Weibull PDF graphs, with different Beta β values (.5, 1, 1.5 and 2). In some way, we can say that these curves look similar, but as Beta β changes, the Weibull PDF takes on different shapes. All 4 curves are drawn with eta (η)=35. Increasing the eta (η) value stretches the curve horozontally.

effect of slope parameter Beta variation of shape of Weibull distribution
Ch 2 Fig 4

Definition of the Weibull PDF: The Weibull Probability Distribution is defined by the following PDF equation:

The Weibull distribution defining equation

Ch 2 Fig 5

Of course, this equation can be used to create a graph of the PDF that is much easier to look at. See Ch 2 Figure 4 which was created using the equation in a spreadsheet.

Chapter 2 Summary: In chapter 2, we discussed three industrially useful Probability Density Functions.

  1. Binomial
  2. Gaussian (or Normal)
  3. Weibull
Other common Probability Density Functions a student is likely to encounter in his/her reading are: Chapters four, five and six will present methods for performing computations using the distributions we have covered in detail.

Note for the Poisson Probability Density Function: Quote from the Wikipedia Poisson Distribution article: The Poisson distribution expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant (probability) rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume. End Quote.

Poisson (fish in French) Distribution is a fun and simple distribution. It handles problems like this: "Loop 1 Around Austin has many accidents. In fact, there is a 1% chance of accident for every hour (I made that up). Using the Poisson Distribution (or equation), you can compute the likelihood of having 3 accidents in an hour. I haven't thought about it for a while, but I think we could extend that to finding the likelihood of 5 accidents in 2 hours if we really wanted to. Can you figure out how to do that? In statistics, you can play around with the ideas and figure out how to do things. This kind of math can be used to figure out how many ambulances should be on duty etc.

I am writing a book on Weibull Statistics. It is called the Weibull Bible. It has lots of graphs that show earthquakes and how the data spreads out over time. In that book, I explain how Weibull Analysis can pull all those ideas together and make really good predictions about the future. The book also has examples about tsunamis, flu epidemics and lots of other things (mostly bad things I am afraid). You might be interested in The Weibull Bible some time.

This is referenced to the original source: Frank A. Haight (1967), Handbook of the Poisson Distribution. New York: John Wiley & Sons. Wikipedia also provides a nice PDF graph for the Poisson Distribution showing how the PDF changes as Lambda (one of the input parameters) changes.

End of Chapter 2

Please use the BACK ARROW at top left of your browser to get back to the main statistics lessons.

Contact the author paul-watson@sbcglobal.net by e-mail.
© 2022 All Rights Reserved
Paul F. Watson

Beginning of St. Paul's Statistics Introduction

Dionysus.biz Home Page