illustration of mean distribution of bolt strength

St. Paul's Statistics Introduction: Definitions

For the Salvation of Many
by Paul F. Watson

The definitions page will be made available throughout this course. The words are in alphabetic order, so please scan down the page to find the word you are looking for.

Event = something that happens. In statistics, events are happenings where the result can be measured or counted. Statistics deals with numbers, not value judgements.

Example 1: Coin Throws: If we throw 15 coins in the air, we can count the number of heads after they hit the ground. Something happened, the result can be counted.
Example 2: Monitoring Infection Rates: If we entered a doctors office and found 12 patients, we could test each of them for Flu and write down the results. Each patient would either "have the flu", or "not". Once again, something happened, we measured it. This is the second example of an event.
Example 3: Monitoring Milk Bacterial Counts: If we had a herd of milk cows, we would routinely test the milk for bacterial content. We would test a cow, write down the results. This would continue until the entire herd was tested. The previous two examples had "yes/no" or binary results. This example is a little different. In this example, we are actually collecting numbers from each cow tested. Some of the cows have a higher bacterial count, and some lower. These numbers could be used in two different ways.
- Individual cows could be identified that exceed allowable bacteria counts. This process reduces their bacterial count to a binary result (too high, or ok).
- Alternatively, we could analyse the herd as a group. We could divide the bacteria counts into ranges (e.g. everything between 25 and 50 might be one range) and then create a PDF similar to the one for the two coin problem of Chapter 1. This kind of analysis considers the health of the herd as a group, and it involves number values (not binary).

Gaussian Probability Distribution: First, what is a probability distribution? A probability distribution is a blue print or pattern of how probabilities are matched to outcomes of some event. Let's clarify that idea of a probability distribution before we explain a particular type, the Gaussian Probability Distribution.

The example that follows is not Gaussian, but its shape is very similar. Consider the probability distribution, that would describe 50 coins thrown in the air, and the number of heads are counted after they land. The distribution would tell you the percent of throws that produces 25 heads. It would tell you the percent of throws with 26 heads. It would tell you the percentage of throws that would result in 27 heads. In fact, the distribution would match a probability to every possible throw count (0 through 50). This is usually done by either a PDF graph, or by an equation (note that an equation can be used to draw a graph.)

Next, let's define the Gaussian Probability Distribution. It is the very common, "bell shaped" PDF curve. The Gaussian PDF is actually defined by the equation shown below:

graphic showing the defining equation for the Gauss probability distribution

Some of the symbols in the Gaussian equation may not look familiar to you.

e = 2.7182818
Pi π = 3.14159
Mu μ (pronounced 'mew') is the mean, or average value of the PDF curve. It is literally, the left-right balance point of the PDF. The computation of the mean will be taught in Chapter 4.
Sigma σ (pronounced 'sig ma') is the width indicator (or spread indicator) that mathematically describes how stretched the PDF is in the horizontal direction. The computation of the standard deviation (σ) will be covered in chapter 4.

Fortunately, we will not be working with the Gaussian equation, but we will see graphs and discuss them in Chapter 2. Actual analysis by using math and tables will be explained in Chapter 4. There are difficulties working directly with the Gaussian equation; therefore, tables and simple equations have been developed which will get you to commonly needed business, industrial and scientific answers.

PDF graph (or probability density function graph) is a picture representing all of the possible state outcomes that can result from some event. The possible outcomes are listed across the x axis, and the area above the x axis indicates the probability of occurrence. Unlike the usual math graph, AREA is important --- not y value. It is difficult to comprehend from words, but examples provided in Chapter 2 will clarify this concept.

Population = the entire family of similar objects that have ever been made, or will ever be made. If we are talking about 1/4 inch bolts, then the population is made of either:

Every 1/4 inch bolt that has been or ever will be made or
Every 1/4 inch bolt (ever) made by a common manufacturing method

So the words "similar objects" are causing a bit of confusion. Usually, the context of the problem will make it clear whether we are talking about all bolts, or only the ones coming out of a particular factory, with a common manufacturing process.

Example 1: If we are talking about persons in Oregon, then population would typically refer to everybody living in Oregon now; but it might mean everyone who lived here in past, present or future. Again the context of the problem generally clarifies which interpretation of population is intended.

Example 2: For milk cows, the uncertainty of POPULATION meaning is much like persons. If you are doing statistics on milk cows, you probably work for a dairy association and are concerned about milk cows now living in Oregon which are somewhat different from historical cows in terms breed, herd management methods, testing etc..

Population always refers to many, many of the object being analysed. A population is typically tens of thousands if not more.

Probability = the likelihood of a specific outcome from an event. The probability is usually expressed as a percentage (e.g. 25% likely). The percentage indicates the result which is expected after a very, very large number of events.

For example, a coin toss is 50% likely to end up heads. This means that if we throw a million coins, we should expect pretty close to 50% of a million or 500,000 heads. If we throw 100 coins, we should expect 50% of 100, or about 50 heads. As the number of coins thrown gets smaller, the predicted number becomes less accurate.

State = refers to the overall situation with regards to a group of objects. State implies little or no concern for individual identity of the objects; but refers only to the impersonal description of how many are in what condition.

Example 1: Consider 5 coins, named Albert, Beth, Charlie, Dianna and Ed. One possible state is 2 heads and 3 tails. When we talk about state, we are not concerned with the names of the 2 that are heads. Our level of concern stops with the impersonal observance of 2 heads and 3 tails.

Statistics tends to be confusing. One of the reasons is: As we approach a problem, we may have to think in very small detail about what can happen (e.g. how many combinations of Albert, Beth, Charlie, Dianna and Ed are there that result in 2 heads?). It is hard to know unless you write them down - by name. But the final answer rarely is concerned with that level of detail. Thus in the early stages of a problem, we must often think in small detail, but the final answer requires thinking at a much more general level. This switching back and forth combined with the large number of "specialty words" used in Statistics makes for a lot of confusion. In this course, we will try to be as clear as possible and will use as few "specialty terms" as possible.

Example 2: If we consider 10,000 molecules in a cubic inch of air, the state might be the number of molecules that travel at different speeds (10-20 mph, 21-30 mph ... 91-100 mph). All 10,000 molecules may have names and personalities; sorry to offend, but we really do not care. Description of the state stops with the impersonal description of how many are in each speed bracket.

It is a bit like the political process. Every voter has a name and personality; but, the election result is a raw count of votes for each candidate (and ignores who voted Republican vs. who voted Democratic ...)

A Statistical Distribution: is a "blue print" or pattern that relates states to probability of occurrence. The pattern can be defined by a graph like the ones I am showing you in the course. The pattern for some Statistical Distributions can also be defined by an equation.

The equation form is often used because Calculus Mathematics is very good at computing area underneath a curve which is defined by equation. This is primarily true for Statistical Functions where the x axis is a continuous number line which includes decimal fractions (e.g. 1, 1.1, 1.15, 1.3, 1.567, 2, 2.1, 2.34 ... and everything in between.

So we might conclude by saying that if we have 5 equations, each representing a statistical distribution, then we have five different blueprints and hence five different statistical distributions. When they are graphed, they usually look different from one another. By simply looking at a PDF, it is often obvious what distribution it is.

Success vs. Failure: An event often results in "success" or "failure". If a boy gave a ride to his girl friend and ran out of gas, he might call that success because he would enjoy the long walk back to town with his friend. The girl might call it failure because she secretly had a date with someone else, and needed time to get ready.

So the result tagged as success is a matter of viewpoint. During statistical analysis, it does not really matter which result you call success (running out of gas, or not), but once you start the problem you mustn't change your definition. Stick to it until the final answer is computed. If you follow that approach, you will get the right answer.

Usually, the probability of success is represented by variable p, and failure by variable q. That is simply the way most teachers and writers do things. We will stick to this convention as it will make understanding the work of others a lot easier if we use the same terminology.

Trial:When we talk about statistical distributions, we are talking about the percentage occurrence of each outcome were we to repeat an event (or an experiment) many, many times.

Each time we do an event (or experiment), we only get one result. So it is necessary to repeat the experiment thousands of times to build up a record that matches event to percentage occurrence. Each time we repeat the event of interest, that is called a trial.

For the dice problem in Chapter 1, it was not necessary to do thousands of trials in order to create a Probability Density Function. Why? The probabilities of different dice roles are simple and well understood. It is possible to use math and compute the probability of various outcomes without performing any experiments. Some problems are like that, but other problems require hundreds if not thousands of trials to develop a good estimate of the PDF.

Conclusion: A trial is a single experiment that meets some description of an event (event being the idea or happening you are exploring.) Often, hundreds of repetitions of an experiment are needed to make good statistical conclusions. In the language of statistics, hundreds or thousands of trials may be needed to characterise the probability density function (PDF) of an event.

Weibull Probability Distribution Function: the Weibull Probability Distribution Function is defined by the following PDF equation:

defining equation for the Weibull Probability Distribution

The two parameter Weibull equation has variable t (for time) on the x axis. Parameters Beta β and Eta η control the shape of the PDF. Beta β is called the "slope parameter" and eta is called the "characteristic life" parameter because it represents the value of t, where 63% of the samples are trapped by the PDF distribution.

The Weibull PDF is capable of modelling many different physical behaviours - especially ones connected with reliability and failure. Three types of reliability modelling that can be done with the Weibull are:

Infant Mortality (where newly delivered products tend to fail)
Constant Rate Failure (where products fail at random and without respect to their age
Wear-out Failure (where products begin failing only after an extended period of use

The Weibull distribution can also be used in a wide variety of physical situations such as describing the percent distribution of smoke particle size, strength of steel and many others.

Beginning of St. Paul's Statistics Introduction

Dionysus.biz Home Page

St. Paul's Statistics Introduction: Definitions

For the Salvation of Many by Paul F. Watson

For the Salvation of Many
by Paul F. Watson