A Comprehensive Guide to the Normal Distribution (2025)

Sachinsoni

·

Follow

13 min read

·

Oct 28, 2023

--

Hey there! Ever noticed how lots of things in life follow a certain pattern, like when most people are a similar height or test scores tend to gather around a middle number? That’s what we call the ‘bell curve.’ It’s like a helpful shape that tells us how common or rare something is. In this blog, we’ll explore why this ‘bell curve’ is super important. We’ll see how it helps us guess things and why it’s used in many different areas, from science to everyday life. Whether you love or hate numbers, we’ll uncover how this ‘bell curve’ helps us understand the world around us better.

A Comprehensive Guide to the Normal Distribution (2)

What is Normal Distribution ?

Normal distribution, also known as Gaussian distribution, is a probability distribution that is commonly used in statistical analysis. It is a continuous probability distribution that is symmetrical around the mean, with a bell-shaped curve.

A Comprehensive Guide to the Normal Distribution (3)

The normal distribution is characterized by two parameters: the mean (μ) and the standard deviation (σ). The mean represents the centre of the distribution, while the standard deviation represents the spread of the distribution.

Why Normal Distribution is important to study ?

Understanding the normal distribution is paramount for data scientists as it forms the backbone of many statistical methods. For instance, imagine a scenario where a data scientist is analyzing customer purchase amounts in e-commerce. Knowing that these amounts often follow a bell-shaped pattern allows for the use of statistical tools that assume a normal distribution, enabling accurate predictions, identification of outliers, and better decision-making regarding marketing strategies or inventory management. This foundational knowledge is indispensable in data analysis, ensuring robust and reliable interpretations of data trends and patterns.

Normal Distribution PDF Equation :

A Comprehensive Guide to the Normal Distribution (4)

Effect of changing mean and standard deviation on the normal distribution graph :

Effect of Changing the Mean (μ): The mean determines the center of the distribution. When the mean changes, the entire distribution shifts along the x-axis. Increasing the mean shifts the curve to the right, while decreasing it shifts the curve to the left, keeping the shape of the curve unchanged.

Effect of Changing the Standard Deviation (σ): The standard deviation influences the spread of the distribution. A larger standard deviation results in a wider, more spread-out curve, while a smaller standard deviation leads to a narrower, more concentrated curve. When the standard deviation increases, the graph becomes flatter and more dispersed around the mean. Conversely, decreasing the standard deviation makes the graph taller and more concentrated around the mean.

A very interactive streamlit app for doing practical on effect of changing mean and standard deviation on normal distribution graph PDF. Click Here!

Standard Normal Variate : (z-score)

A Standard Normal Variate(Z) is a standardized form of the normal distribution with mean = 0 and standard deviation = 1.

A Comprehensive Guide to the Normal Distribution (5)

Standardizing a normal distribution allows us to compare different distributions with each other, and to calculate probabilities using standardized tables or software.

Converting normal distribution into standard normal distribution :

The formula for standardizing a normal distribution into a standard normal distribution is:

Z-score= (Xμ) / σ

Where:

  • Z-score is the standardized value (the Z-score).
  • X is the original value in the normal distribution.
  • μ is the mean of the normal distribution.
  • σ is the standard deviation of the normal distribution.

The resulting Z-score will follow a standard normal distribution with a mean of 0 and a standard deviation of 1.

Benefits of Standard Normal Distribution :

Suppose you’re analyzing heights in a population. If the mean height is 65 inches with a standard deviation of 3 inches, and an individual’s height is 71 inches, you can calculate the Z-score to understand how many standard deviations that height is from the mean:

Z-score= (Individual’s height−Mean height) / Standard deviation

Z- score=(71−65) / 3 = 2

To Calculate above Z-score, we need Z-table. A Z-table tells you the area underneath a normal distribution curve, to the left of the z-
score.

When looking at the Z-table, a Z-score of 2 corresponds to a cumulative probability of approximately 0.9772.

This means that approximately 97.72% of the population in this standard normal distribution has a height less than the individual’s height of 71 inches.

The Z-table enables us to interpret Z-scores in terms of probabilities, allowing us to understand the relative position of a value within a standard normal distribution.

Click here to see the Z-table.

Properties of Normal Distribution :

  1. Symmetricity : — The normal distribution is symmetric about its mean, which means that the probability of observing a value above the mean is the same as the probability of observing a value below the mean. The bell-shaped curve of the normal distribution reflects this symmetry.
A Comprehensive Guide to the Normal Distribution (6)

2. Measures of Central Tendencies are equal.
i.e., mean = median = mode

3. Empirical Rule : — The normal distribution has a well-known empirical rule, also called the 68–95–99.7 rule, which states that approximately 68% of the data falls within one standard deviation of the mean, about 95% of the data falls within two standard deviations of the mean, and about
99.7% of the data falls within three standard deviations of the mean.

A Comprehensive Guide to the Normal Distribution (7)

4. The area under the curve is 1.

A Comprehensive Guide to the Normal Distribution (8)

Skewness :

A normal distribution is a bell-shaped, symmetrical distribution with a specific mathematical formula that describes how the data is spread out. Skewness indicates that the data is not symmetrical, which means it is not normally distributed.

A Comprehensive Guide to the Normal Distribution (9)

Skewness is a measure of the asymmetry of a probability distribution. It is a statistical measure that describes the degree to which a dataset deviates from the normal distribution.

In a symmetrical distribution, the mean, median, and mode are all equal. In contrast, in a skewed distribution, the mean, median, and mode are not equal, and the distribution tends to have a longer tail on one side than the other.

A Comprehensive Guide to the Normal Distribution (10)

Skewness can be positive, negative, or zero. A positive skewness means that the tail of the distribution is longer on the right side, while a negative skewness means that the tail is longer on the left side. A zero skewness indicates a perfectly symmetrical distribution.

The greater the skew the greater the distance between mode, median and mean.

CDF of Normal Distribution : -

The Cumulative Distribution Function (CDF) of a normal distribution provides the probability that a random variable from that distribution will be less than or equal to a certain value

A Comprehensive Guide to the Normal Distribution (11)

Use of Normal Distribution in Data Science Field :

The normal distribution is instrumental in outlier detection due to its predictable spread of data around the mean. Methods like the z-score and standard deviation rely on the assumptions of a normal distribution to identify outliers. By calculating how far a data point deviates from the mean in terms of standard deviations, these methods flag values that fall significantly beyond the expected range as potential outliers. Although not all datasets perfectly conform to a normal distribution, leveraging the properties of this distribution assists in recognizing extreme values that might signify irregularities or anomalies within the data.

Statistical Moments :

Statistical moments are a way to describe the characteristics and distribution of data in statistics. They provide quantitative measures about the shape, central tendency, and variability of a probability distribution of a dataset.

There are several moments used in statistics, with the most common being:

  1. First Moment: Mean
  2. Second Moment: Variance
  3. Third Moment: Skewness The third moment is skewness, which measures the asymmetry of the distribution. A symmetric distribution has a skewness of 0. Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side.
  4. Fourth Moment: Kurtosis
    The fourth moment is kurtosis, which is measure of the “tailedness” of the probability distribution of a real-valued random variable.

Let’s take an example to understand these four moment :

Suppose, Virat Kohli played 100 matches in both 2009 and 2010, maintaining a consistent batting average of 40 in both years. When examining the probability distribution graphs of runs scored, the blue distribution represents the runs in 2009, while the red one represents the runs in 2010. Notably, although the mean for both distributions is identical, the spread of data in 2010 appears narrower around the mean, suggesting increased consistency in making runs that year.

A Comprehensive Guide to the Normal Distribution (12)

Moreover, the observation that the red distribution’s tail is positioned below the blue one implies that in 2010, Virat Kohli experienced fewer instances of scoring centuries or being dismissed for low scores compared to 2009. This observation indicates a more stable and reliable performance in 2010, as there were fewer extreme instances of both very high and very low scores compared to the spread seen in 2009.

Now, In the given scenario where the black distribution represents the runs in 2009 and the red one symbolizes the runs in 2010, it’s evident that both years exhibit identical mean values and similar spreads (standard deviations). However, the key disparity lies in their skewness.

A Comprehensive Guide to the Normal Distribution (13)

In 2009, Virat Kohli’s performance showcases a higher frequency of lower scores, leading to a positively skewed distribution. Conversely, in 2010, there’s a higher frequency of higher runs, resulting in a negatively skewed distribution. This difference in skewness indicates that, in 2009, Kohli had more occurrences of scoring lower runs, while in 2010, there were more instances of achieving higher runs.

Consider this alternative scenario where the red distribution signifies the runs in 2009, while the black one represents the runs in 2010. Notably, the mean, standard deviation, and skewness of both distributions are identical. However, the distinguishing factor lies in the kurtosis and the flatness of the tails.

A Comprehensive Guide to the Normal Distribution (14)

The 2009 runs distribution displays flatter tails, indicating a greater number of outliers compared to the 2010 distribution. This suggests that in 2009, Virat Kohli encountered multiple occurrences of both low and high scores. There were more instances of both unusually low and high scores, contributing to the flatter tails. Conversely, in 2010, the distribution demonstrates less variability in extreme scores. This implies that Virat Kohli experienced fewer occurrences of both very low and very high scores in 2010, resulting in a distribution with less tail flatness, as indicated by the kurtosis.

Use-Case of Kurtosis :

In finance, understanding the kurtosis of stock returns or asset prices is crucial. It helps in analyzing the risk associated with extreme movements in the market. A high kurtosis might indicate a higher likelihood of extreme positive or negative returns, highlighting the potential risks associated with an investment.

In finance, kurtosis risk is important to consider because it indicates that there is a greater probability of large losses or gains occurring, which can have significant implications for investors. As a result, investors may want to adjust their investment strategies to account for kurtosis risk.

Excess Kurtosis :

Excess kurtosis is a measure of how much more peaked or flat a distribution is compared to a normal distribution, which is considered to have a kurtosis of 0. It is calculated by subtracting 3 from the sample kurtosis coefficient.

Types of Kurtosis :

  1. Leptokurtic
    A distribution with positive excess kurtosis is called leptokurtic. In terms of shape, a leptokurtic distribution has fatter tails. This indicates
    that there are more extreme values or outliers in the distribution.
    Example — Assets with positive excess kurtosis are riskier and more volatile than those with a normal distribution, and they may experience sudden price movements that can result in significant gains or losses.
A Comprehensive Guide to the Normal Distribution (15)

2. Platykurtic :
A distribution with negative excess kurtosis is called platykurtic. In terms of shape, a platykurtic distribution has thinner tails. This indicates that there are fewer extreme values or outliers in the distribution.
Assets with negative excess kurtosis are less risky and less volatile than those with a normal distribution, and they may experience more gradual price movements that are less likely to result in large gains or losses.

3. Mesokurtic :
Distributions with zero excess kurtosis are called mesokurtic. The most prominent example of a mesokurtic distribution is the normal distribution family, regardless of the values of its parameters.
Mesokurtic is a term used to describe a distribution with a excess kurtosis of 0, indicating that it has the same degree of “peakedness” or “flatness” as a normal distribution.
Example — In finance, a mesokurtic distribution is considered to be the ideal distribution for assets or portfolios, as it represents a balance between risk and return.

How to find if a given distribution is normal or not?

  1. Visual inspection: One of the easiest ways to check for normality is to visually inspect a histogram or a density plot of the data. A normal distribution has a bell-shaped curve, which means that the majority of the data falls in the middle, and the tails taper off symmetrically. If the distribution looks approximately bell-shaped, it is likely to be normal.
  2. QQ Plot: Another way to check for normality is to create a normal probability plot (also known as a Q-Q plot) of the data. A normal probability plot plots the observed data against the expected values of a normal distribution. If the data points fall along a straight line, the distribution is likely to be normal.
  3. Statistical tests: There are several statistical tests that can be used to test for normality, such as the Shapiro-Wilk test, the Anderson-Darling test, and the Kolmogorov-Smirnov test. These tests compare the observed data to the expected values of a normal distribution and provide a p-value that indicates whether the data is likely to be normal or not. A p-value less than the significance level (usually 0.05) suggests that the data is not normal.

What is a QQ Plot and how is it plotted?

A QQ plot (quantile-quantile plot) is a graphical tool used to assess the similarity of the distribution of two sets of data. It is particularly useful for determining whether a set of data follows a normal distribution.

A Comprehensive Guide to the Normal Distribution (16)

Here’s a general procedure to create a Q-Q plot:

  1. Sort the Data: Arrange the dataset in ascending order to work with the quantiles properly.
  2. Calculate Theoretical Quantiles: Determine the theoretical quantiles for the specific distribution (e.g., normal distribution). These quantiles are calculated based on the number of observations in your dataset and the expected distribution.
  3. Plotting: Once you have both sets of quantiles (observed data and theoretical distribution), plot them against each other. The x-axis represents the theoretical quantiles, while the y-axis represents the observed data quantiles.

Here’s an example using Python and its library statsmodel to generate a Q-Q plot for a sample dataset:

# using statsmodel

import statsmodels.api as sm
import matplotlib.pyplot as plt

# Create a QQ plot of the two sets of data
fig = sm.qqplot(df['sepal_length'], line='45', fit=True) # Here you need to mention a line = 45 which mean you want to create 45 degree line.

# Add a title and labels to the plot
plt.title('QQ Plot')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Sample Quantiles')

# Show the plot
plt.show()

Here I am taking ‘sepal_length’ example of Iris dataset to check whether it follows normal distribution or not, Now I got the following graph :

A Comprehensive Guide to the Normal Distribution (17)

In a QQ plot, the quantiles of the two sets of data are plotted against each other. The quantiles of one set of data are plotted on the x-axis, while the quantiles of the other set of data are plotted on the y-axis. If the two sets of data have the same distribution, the points on the QQ plot will fall on a straight line. If the two sets of data do not have the same distribution, the points will deviate from the straight line.

How to interpret QQ plots ?

A Comprehensive Guide to the Normal Distribution (18)

The important thing is about Q-Q plot is not only create to compare only normal distribution but also you can compare other distribution also. Here I am taking example of comparing two uniform distribution :

# First I am creating 1000 dataset which follows uniform distribution and stored
# it in variable x.

import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Generate a set of random data
x = np.random.uniform(low=0, high=1, size=1000)

# Fit a uniform distribution to the data
params = stats.uniform.fit(x)
dist = stats.uniform(loc=params[0], scale=params[1])
# Create a QQ plot of the data using the uniform distribution
fig = sm.qqplot(x, dist=dist, line='45')

# Add a title and labels to the plot
plt.title('QQ Plot of Uniform Distribution with Uniform Fit')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Sample Quantiles')

# Show the plot
plt.show()

Now you got the following graph which proves your distribution is uniform distribution.

A Comprehensive Guide to the Normal Distribution (19)

I trust this article has provided a comprehensive understanding of the Normal Distribution , Statistical Moments and about Q-Q plots. Thank you for taking the time to read through it! Follow me for such a amazing content and like the article. Have a nice day!

A Comprehensive Guide to the Normal Distribution (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Kieth Sipes

Last Updated:

Views: 5693

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.