Machine Learning
  • Introduction
  • Self LEarning
  • Why Statstics in ML Or Data Science
  • How important is interpretability for a model in Machine Learning?
  • What are the most important machine learning techniques to master at this time?
  • Learning
    • Supervised Learning
      • Evaluating supervised learning
        • K-fold cross validation
        • Using train/test to prevent overfitting of a polynomial regression
      • Regression
        • Linear regression
          • The ordinary least squares technique
          • The gradient descent technique
          • The co-efficient of determination or r-squared
            • Computing r-squared
            • Interpreting r-squared
          • Assumptions of linear regression
          • Steps applied in linear regression modeling
          • Evaluation Metrics Linear Regression
          • p-value
        • Ridge regression
        • Least absolute shrinkage and selection operator (lasso) Regression
        • Polynomial regression
        • Performance Metrics
        • Regularization parameters in linear regression and ridge/lasso regression
        • Comments
      • Classification
        • test
        • Logistic Regression
        • naïve Bayes
        • support vector machines (SVM)
        • decision trees
          • Split Candidates
          • Stopping conditions
          • Parameters
            • Non Tunable Or Specificable
            • Tunable
            • Stopping Parameters
        • Evaluation Metrics
      • Random Forest
        • Logistic Regression Versus Random Forest
        • Paramters
          • Non Tunable Parameters
          • Tunable
          • Stopping Param
        • Parameter Comparison of Decision Trees and Random Forests
        • Classification and Regression Trees (CART)
        • How random forest works
        • Terminologies related to random forest algorithm
        • Out-of-Bag Error
      • Decision Trees
        • Gini Index
    • Unsupervised learning
      • Clustering
        • test
        • KMeans Clustering
          • Params
          • Functions
        • Gaussian Mixture
          • Parameters
          • functions
    • Semi-supervised learning
    • Reinforcement learning
    • Learning Means What
    • Goal
    • evaluation metrics
      • Regression
        • MSE And Root Mean Squared Error (RMSE)
        • Mean Absolute Error (MAE)
      • Model Validation
        • test
      • The bias, variance, and regularization properties
        • Regularization
          • Ridge regression
        • Bias And Variance
      • The key metrics to focus
    • hyperparameters
  • Steps in machine learning model development and deployment
  • Statistical fundamentals and terminology
  • Statistics
    • Measuring Central Tendency
    • Probability
    • Standard Deviation , Variance
    • root mean squared error (RMSE)
    • mean Absolute Error
    • explained Variance
    • Coefficient of determination R2
    • Standard Error
    • Random Variable
      • Discrete
      • Continuous
    • Sample vs Population
    • Normal Distribution
    • Z Score
    • Percentile
    • Skewness and Kurtosis
    • Co-variance vs Correlation
    • Confusion matrix
    • References
    • Types of data
      • Numerical data
        • Discrete data
        • Continuous data
      • Categorical data
      • Ordinal data
    • Bias versus variance trade-off
  • Spark MLib
    • Data Types
      • Vector
      • LabeledPoint
      • Rating
      • Matrices
        • Local Matrix
        • Distributed matrix
          • RowMatrix
          • IndexedRowMatrix
          • CoordinateMatrix
          • BlockMatrix
    • Comparing algorithms supported by MLlib
      • Classification
    • When and why should you use MLlib (versus scikit-learn versus TensorFlow versus foo package)
    • Pipeline
    • References
    • Linear algebra in Spark
  • Terminology
  • Machine Learning Steps
    • test
  • Preprocessing and Feature selection techniues
  • The importance of variables feature selection/attribute selection
    • Feature Selection
      • forward selection
      • mixed selection or bidirectional elimination
      • backward selection or backward elimination
      • The key metrics to focus on
  • Feature engineering
  • Hyperplanes
  • cross-validation
  • Machine learning losses
  • When to stop tuning machine learning models
  • Train, validation, and test data
  • input data structure
  • Why are matrices/vectors used in machine learning/data analysis?
    • Linear Algebra
  • OverView
  • Data scaling and normalization
  • Questions
  • Which machine learning algorithm should I use?
Powered by GitBook
On this page
  • The normal distribution is an “ideal” model for continuous data
  • What are some real world examples of normally distributed quantities?

Was this helpful?

  1. Statistics

Normal Distribution

PreviousSample vs PopulationNextZ Score

Last updated 5 years ago

Was this helpful?

Discrete probability distributions can’t handle every situation.

Why Normal Distribution ?

Usually the data distribution in Nature follows a Normal distribution ( few examples like - age, income, height, weight etc., ) . So its the best approximation when we are not aware of the underlying distribution pattern.

Most often the goal in ML/ AI is to strive to make the data linearly separable even if it means projecting the data into higher dimensional space so as to find a fitting "hyperplane" (for example - SVM kernels, Neural net layers, Softmax etc.,). The reason for this being "Linear boundaries always help in reducing variance and is the most simplistic, natural and interpret-able" besides reducing mathematical / computational complexities. And, when we aim for linear separability, its always good to reduce the effect of outliers, influencing points and leverage points. Why? Because the hyperplane is very sensitive to the influencing points and leverage points (aka outliers) - To undertstand this - Lets shift to a 2D space where we have one predictor (X) and one target(y) and assume there exists a good positive correlation between X and y. Given this, if our X is normally distributed and y is also normally distributed, you are most likely to fit a straight line that has many points centered in the middle of the line rather than the end-points (aka outliers, leverage / influencing points). So the predicted regression line will most likely suffer little variance when predicting on unseen data.

The normal distribution is important because of the

. In simple terms, if you have many independent variables that may be generated by all kinds of distributions, assuming that nothing too crazy happens, the aggregate of those variables will tend toward a normal distribution. This universality across different domains makes the normal distribution one of the centerpieces of applied mathematics and statistics.

Another corollary is that the normal distribution makes math easy - things like calculating moments, correlations between variables, and other calculations that are domain specific. For that reason, even if a distribution isn't actually normal, it is useful to assume that it is normal to get a good, first-order understanding of a set of data.

Definition from Wikipedia

The normal distribution is

  1. A random variable with a Gaussian distribution is said to be normally distributed and is called a normal deviate

let's see why Normal distribution is such a beautiful phenomenon.

Nice, huh?

Let's see now how we can find X for a particular probability.

From Standard Normal table, you get the Z, and then X, simple as that :)

The normal distribution is an “ideal” model for continuous data

The normal distribution is called normal because it’s seen as an ideal. It’s what you’d “normally” expect to see in real life

for a lot of continuous data such as measurements.

The normal distribution is in the shape of a bell curve.

The curve is symmetrical, with the highest probability density in the center of the curve. The probability density decreases the further away you get from the mean. Both the mean and median are at the center and have the highest probability density.

The normal distribution is defined by two parameters, μ and σ2

. μ tells you where the center of the curve is, and σ gives you the spread. If a continuous random variable X follows a normal distribution with mean μ and standard deviation σ, this is generally written X ~ N(μ, σ2).

What are some real world examples of normally distributed quantities?

Although height and weight are often cited as examples, they are not exactly normally distributed. Weight, in particular, is somewhat right skewed. The average American man weighs about 190 pounds. There are some men who weigh well over 380 but none who weigh even close to 0.

IQ is sometimes cited as an example, but it has fatter tails than the normal.

No physical variable is exactly normally distributed.

The parameter μ in this formula is the or of the distribution (and also its and ). The parameter

σ is its ; its is therefore σ2.

The relation between probability and Z is fixed and can be found out from Standardized Normal Probability Table (more info here ).

mean
expectation
median
mode
standard deviation
variance
Standard normal table
https://www.quora.com/Why-is-the-normal-distribution-important
Central limit theorem