How to Design a Weighing Scale?
Suppose we had to design a bathroom weighing scale, how would we decide what should be the range of the weighing machine? Would we take the highest recorded human weight in history and use that as the upper limit for our weighing scale? This may not be a great idea as the sensitivity of the scale would get reduced if the range is too large. At the same time, if we keep the upper limit too low, it may not be usable for a large percentage of the population!
So how do we decide what should be a reasonable upper limit, so that, say, 95% of the population can use the scale? Suppose, further, that we managed to conduct a survey and found some statistical measures about the population who are the intended users of the weighing scale, would that be of use? Let us assume that the survey determined that the mean weight of the population is 74 kg and the standard deviation is 28 kg. Can we now determine what should be the upper limit of the weighing scale?
This is where statistics and normal distributions come to our rescue! Given the above information, we can conclude that if we set the upper limit to be 120 kg, then we can expect it to be good enough for 95% of the population!
What is Normal Distribution?
The normal distribution is one of the most important continuous probability distributions of a random variable. Many real-world phenomena such as heights & weights of a population of organisms, errors in measurement, variations in stock prices, etc. can be modeled using normal distributions. The normal distribution is also known as a Gaussian distribution, named after the famous mathematician Carl Friedrich Gauss, who worked extensively on this probability distribution.
If we plot the data of any of these phenomena as relative frequency distribution, the resulting graph will be a very distinct bell-shaped curve. This bell curve is a characteristic of the normal distribution. A typical bell curve (dark blue curve in the picture below) is centered around the mean and spreads symmetrically on either side as shown in the figure below.
We will learn more about the various aspects of the normal distribution in the next few sections.
Characteristics of a Normal Distribution
The normal distribution can be effectively described using two parameters – the mean and the standard deviation.
The mean represents the center of the bell curve and the graph is perfectly symmetric about the center. Also, the mean, the median, and the mode are all equal for a normal distribution.
The standard deviation gives a measure of how much the data is spread from the center. The higher the standard deviation, the more the data is spread out, and the flatter the bell curve looks. So, two different random variables that are both normally distributed and that have the same mean but different standard deviations will have bell curves that may be flatter or taller depending on the standard deviation.
The variance is another commonly used measure of the spread of the distribution and is equal to the square of the standard deviation.
The mean of the distribution is typically denoted by µ and the standard deviation is denoted by σ.
Technical Definition of a Normal Distribution
Formally, the probability distribution of a normally distributed random variable X with mean µ and variance σ2 is written as . It can be expressed in terms of the probability density function, p(x), which is given by the following equation:
The area under the probability density function can be used to estimate the probability that the X is less than a value, say a. This can be calculated using the formula for cumulative density function, :
That is a scary-looking integral and unfortunately, it cannot be written in simpler terms using other common functions that we know! Then how do we calculate the probability?
Since it is a definite integral, there are methods to calculate the area under the curve for various values of a. However, this is a very tedious task and to avoid having to do the calculations repeatedly, we make use of a set of pre-computed tables of values for the probability. In the next section, we will look at how to use those tables by calculating Z-scores.
Standard Normal Distribution
The standard normal distribution is a special normal distribution whose mean is 0 and the standard deviation is 1. Let Z be a random variable such that .
Suppose we had to find the probability that the value of Z is less than 2, then it be would the area under the curve (region shaded green) as shown below:
Instead of evaluating the integral, we can look up the value in standard Z-tables such as the one shown below to directly find the probability to be 0.9772.
Z-Scores
The above tables can be used for any normal distribution after applying a suitable transformation. Given a random variable , then we can transform the distribution to the standard form by calculating the z-score that is defined as follows.
The resulting transformed variable will be a standard normal distribution for which the same tables are applicable. We can now look up the table for the corresponding z-score to find the probability.
For example, if a random variable X is normally distributed and has a mean of 20 and a standard deviation of 2, then the probability that X is less than 18 can be calculated as follows:
- Calculate the z-score corresponding to the value of 18.
- Look up the table to find the value of probability corresponding to z = -1. We find that the probability is given by .
- So the probability that X is less than 18 is also roughly 0.16.
Back to Our Weighing Scale Design
Since weights of the population are normally distributed with mean as 74 kg and standard deviation as 28 kg, we can represent it as . To find the upper limit of the range of our weighing scale, we want to find that value of the weight, , which is above 95% of the population, i.e., .
By reverse looking up in the tables, we see that 0.95 corresponds to a z-score of approximately 1.65. We can calculate the value of by substituting it in the equation for the transformation:
Empirical Rule
Since it may not always be possible to have access to these tables, it is useful to remember some standard values by heart. In the graph below, the dotted lines represent the values that are in steps of one standard deviation away from the mean on either side of the mean:
Here are some points to remember:
- The region shaded green is one standard deviation from the mean and has an area of about 0.68, i.e. 68% of the values lie within one standard deviation from the mean.
- The regions shaded orange is between 1 and 2 standard deviations away from the mean on either side and have an area of about 0.135 each.
- So, the green and orange regions combined represent the values that are 2 standard deviations away from the mean and have a total area of about 0.95, i.e. 95% of the values lie within 2 standard deviations from the mean.
- The regions shaded blue are between 2 and 3 standard deviations away from the mean on either side and have an area of about 0.0235 each.
- So, the green, orange, and blue regions combined represent the values that are 3 standard deviations away from the mean and have a total area of about 0.997, i.e. 99.7% of the values lie within 3 standard deviations from the mean.
- Only 0.3% of the values are more than 3 standard deviations from the mean.
Why is the Normal Distribution so Important?
Besides the fact that many natural processes can be modeled using normal distributions, normal distributions play a central role in statistical applications especially because of the central limit theorem and the various applications of the central limit theorem.
In our earlier weighing scale example, we assumed values for the population mean and standard deviation – but how do we do that in reality? It is almost impossible to find the descriptive statistical measures of a population. All that we can do is to collect data from a few samples and find the statistical measures of those samples.
Suppose we take random samples of size n, from a population (not necessarily normally distributed) with mean , and standard deviation , then according to the central limit theorem the means of the samples will tend to be normally distributed with mean and a standard error of .
The central limit theorem lays the foundation for a branch of statistics known as inferential statistics that deals with inferring properties about the population based on measurements from random samples.
Formulas
- The probability density function of a normal distribution is given by:
- The probability that a normal random variable is less than ‘a’, , is calculated using the formula for cumulative density function:
- Given a random variable , the formula to transform the distribution to the standard form by calculating the z-score that is:
Context and Applications
This concept is applicable for pre-graduation, graduation, and post-graduation students for mathematics and statistics, as well as many engineering branches.
Want more help with your statistics homework?
*Response times may vary by subject and question complexity. Median response time is 34 minutes for paid subscribers and may be longer for promotional offers.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.
Search. Solve. Succeed!
Study smarter access to millions of step-by step textbook solutions, our Q&A library, and AI powered Math Solver. Plus, you get 30 questions to ask an expert each month.