Statistics Introduction: Chapter 2

Probability Density Functions

Introduction to Probability Density Functions

There are many statistical distributions. In this chapter, we will examine several industrially important distributions; but, our first example is purely educational and has already been presented. It is the 2 dice example of Chapter 1. Each of the industrially important distributions has at least one Probability Density Function (PDF) similar to Ch 1 Figure 1. Some distributions have an entire family of PDFs.

Please remember that a PDF is a pictorial representation of a statistical distribution and provides clues on how to get useful answers. A PDF is thus a kind of map. Actual answers are cranked out using math methods presented later, but knowing how to do the math and what the answers mean is difficult if you don't understand the PDF.

Characteristics of a PDF:
A Probability Density Function (or PDF) is a graph that describes the likelihood of different outcomes for a chance event. Consider flipping 20 coins. You would expect about 10 heads (and 10 tails). But you would not be surprised if the result were 12 heads on Tuesday or 9 on Friday. The PDF for coin flipping describes the likelihood of both these results AND it also shows a very small likelihood (almost zero) of getting 2 or 3 heads etc.. The PDF is presented below:

Ch 2 Figure 1
Probabilities of 'Heads' Out of 20 Coin Trials

The coin flipping PDF shows all possible outcomes (which of course totals 100%). For a PDF, the area under the curve represents probability (don't look at the Y value). From Ch 2 Figure 1, we can surmise that the probability of getting 9, 10 or 11 heads is 49.6% (add the three center bar areas). The probability of getting 0 or 1 heads is remote (almost zero percent).

PDFs for all distributions are interpreted in the same manner. Please note that on any given day, the observed results of a random event may be different; but, on most days, you will observe results where the large areas indicate high probability of occurence.

The Binomal Distribution PDF

The binomial distribution is widely used by industry to solve a range of problems. The following are real industrial problems similar to ones I dealt with as an engineer that can be solved via the binomial distribution. I have also included an example showing how statistics can be used to gain historical insight or to facilitate computer game modeling.

Criteria for Using the Binomial Distribution
When the following four conditions are ALL true, the Binomial Distribution can be confidently used.

Homework: Using the four criteria above:

  1. Decide whether or not the binomial distribution can be used to analyze the toss of 10 coins that was discussed above (yes or no)
  2. Write a brief list of steps that justify your answer.
  3. LINK Check your answer here

How Many PDFs does the Binomial Have?
The following paragraph will begin to familiarize the student with the trends of the binomial distribution. Please read the material, try to make sense of it and move on. Memorization of trends is not required.

The Binomial is actually a family of PDFs. A binomial is mathematically generated, based on the probability p, and the number of samples n. Thus, each different value of the n,p number pair will generate a different PDF; but, all the PDFs do look rather similar as shown below:

Binomial PDFs - Variation of Parameters n,p
Ch 2 Figure 2

The top row of the figure above shows how the PDF changes when n=4 and p (probability of success) varies from 20% to 50% and finally to 65%. The second row presents the same data for n=8 and p varying through the same range of values. The following general observations can be made:
  1. For the binomial family of PDFs, the general shape is similar. The PDF in all cases is much like the cross section of a bell; but, it is sometime distorted.
  2. When n is larger, the PDF remains the same general shape but has more bars and becomes smoother.
  3. The value of p, determines the distortion of the "bell" shape. p determines if the PDF is skewed left, right or is symetric.

The Gaussian Distribution PDF

the Gaussian distribution is the most widely used statistical distribution in existance. It is typically the first statistical analysis attempted by engineers and scientists. When it doesn't work satisfactorily, they look at other distributions. The Gaussian distribution is elegant and has well defined procedures that can solve a wide range of problems.

Very often, the Gaussian is simply "presumed" to be appropriate and is applied to data. Then, engineers and scientists do a reasonableness check before proceeding further. Below are three examples of phonomina where the Gaussian distribution applies and two examples where it does not work well.

We just looked at some populations which are Gaussian and some that are not. Below are two clearly stated examples of industrial problems that can be solved using the Gaussian Distribution and methods that will be presented in Chapter 4. Chapter 4 will also explain the technical terms "mean" and "standard deviation". Chapter 4 will show you how to compute the mean and standard deviation for a sample of parts comming off a production line.

How Many PDFs does the Gaussian Have?
The following paragraph will begin to familiarize the student with the trends of the Gaussian distribution. Please read the material, try to make sense of it. The student should read and re-read the material until the general trends of mean and standard deviation are committed to memory.

The Gaussian Distribution is actually a family of PDFs. A particular Gaussian PDF is mathematically generated, based on the average value (mean), and the standard deviation (which we will explain mathematically in Chapter 4). Thus, each unique pair of parameters (mean,StdDev) can be used to generate a corresponding Gaussian PDF; and yet, all the PDFs do look rather similar as shown below:

Ch 2 Figure 3
Effect of Changing Mean & Standard Deviation of a Gauss PDF

The Gaussian Distribution has a family of PDF curves, each being defined by its value of mean and standard deviation. The figure above (Ch 2 Figure 3) provides the student with a clear concept of how the PDF changes as the mean increases. Basically, the Gauss PDF shifts off to the right so its center aligns with larger and larger values of the mean. This concept proves very useful because on Tueday our bolt factory may be making 36 KSI strength foundation bolts; but, on Friday we may be making high strength 150 KSI aircraft bolts. We can use the Gauss PDF model for both by simply adjusting the value of the mean!

The Gaussian Distribution has a second parameter- the "so called" standard deviation which we will study in Chapter 4. Looking at the lower part of Ch 2 Figure 3, we can see the effect of varying the Standard Deviation. The lower left graph has a standard deviation of 2.5, and is narrow. The total area under the curve is still 100%, but all the area is fairly close to the central (mean) value. In practice, this means that when such bolts are sent to build foundations, the strength values of all will be pretty close to the mean value (because all the PDF area is close to the mean value).

Examining the lower middle graph, the area is still 100%, but the graph is wider and shorter because the Standard Deviation has increased. This means that bolts built with this kind of Gaussian PDF, will put bolts with considerable strength variation into buildings and airplanes they are intended for (not so good). The lower right graph is wider still indicating large variation in properties of the bolt or other product it describes. In general, industrial processes with a large standard deviation are not a good thing.

We will study the significance of varying mean and/or standard deviation in Chapter 4; but, for now, the student is to understand that the Gaussian PDF has the ability to describe items with high values of strength, ductility, or penguin population. The Gaussian PDF is also capable of describing situations where their is little to much variation in the observed quality be it strength or Penguin population.

OPTIONAL: Some might wonder "Why can't I just make it the way I want and to heck with the mean and standard deviation!

Observations about the Gaussian PDF

The Weibull Distribution PDF

the Weibull distribution is a general purpose set of math that can be applied a wide range of problems that includes:

The Weibull Distribution was invented in 1920?? by TBD but "lived like a recluse" until 1952 when Swedish Engineer Walodi Weibull wrote a short paper explaining its usefulness and the wide variety of problems it can handle. Like the Binomial and the Gaussian distributions, the Weibull distribution has well established procedures for use and can handle a wide variety of problems; but is generally more difficult to use than either the Binomial or the Gaussian distributions. The Weibull has become the "darling" of the reliability world because it handles failure and wear-out problems very well (i.e. its mathematical form corresponds closely to a wide range of equipment failure data).

Like the Gaussian Distribution, the Weibull PDF is generated by an equation which accepts any kind of decimal number (3.2, 5.431, 176, Pi etc.). As a result, it is a continuous distribution (not discrete), and areas under the PDF are equivalent to probability. The Y axis values have little or no meaning.

The Weibull distribution if particular interesting because WORK IN PROGRESS ---

End of Chapter 2

Please use the BACK ARROW at top left of your browser to get back to the main statistics lessons.

Contact the author by e-mail.
© 2016 All Rights Reserved
Paul F. Watson Home Page