Machine Learning
  • Introduction
  • Self LEarning
  • Why Statstics in ML Or Data Science
  • How important is interpretability for a model in Machine Learning?
  • What are the most important machine learning techniques to master at this time?
  • Learning
    • Supervised Learning
      • Evaluating supervised learning
        • K-fold cross validation
        • Using train/test to prevent overfitting of a polynomial regression
      • Regression
        • Linear regression
          • The ordinary least squares technique
          • The gradient descent technique
          • The co-efficient of determination or r-squared
            • Computing r-squared
            • Interpreting r-squared
          • Assumptions of linear regression
          • Steps applied in linear regression modeling
          • Evaluation Metrics Linear Regression
          • p-value
        • Ridge regression
        • Least absolute shrinkage and selection operator (lasso) Regression
        • Polynomial regression
        • Performance Metrics
        • Regularization parameters in linear regression and ridge/lasso regression
        • Comments
      • Classification
        • test
        • Logistic Regression
        • naïve Bayes
        • support vector machines (SVM)
        • decision trees
          • Split Candidates
          • Stopping conditions
          • Parameters
            • Non Tunable Or Specificable
            • Tunable
            • Stopping Parameters
        • Evaluation Metrics
      • Random Forest
        • Logistic Regression Versus Random Forest
        • Paramters
          • Non Tunable Parameters
          • Tunable
          • Stopping Param
        • Parameter Comparison of Decision Trees and Random Forests
        • Classification and Regression Trees (CART)
        • How random forest works
        • Terminologies related to random forest algorithm
        • Out-of-Bag Error
      • Decision Trees
        • Gini Index
    • Unsupervised learning
      • Clustering
        • test
        • KMeans Clustering
          • Params
          • Functions
        • Gaussian Mixture
          • Parameters
          • functions
    • Semi-supervised learning
    • Reinforcement learning
    • Learning Means What
    • Goal
    • evaluation metrics
      • Regression
        • MSE And Root Mean Squared Error (RMSE)
        • Mean Absolute Error (MAE)
      • Model Validation
        • test
      • The bias, variance, and regularization properties
        • Regularization
          • Ridge regression
        • Bias And Variance
      • The key metrics to focus
    • hyperparameters
  • Steps in machine learning model development and deployment
  • Statistical fundamentals and terminology
  • Statistics
    • Measuring Central Tendency
    • Probability
    • Standard Deviation , Variance
    • root mean squared error (RMSE)
    • mean Absolute Error
    • explained Variance
    • Coefficient of determination R2
    • Standard Error
    • Random Variable
      • Discrete
      • Continuous
    • Sample vs Population
    • Normal Distribution
    • Z Score
    • Percentile
    • Skewness and Kurtosis
    • Co-variance vs Correlation
    • Confusion matrix
    • References
    • Types of data
      • Numerical data
        • Discrete data
        • Continuous data
      • Categorical data
      • Ordinal data
    • Bias versus variance trade-off
  • Spark MLib
    • Data Types
      • Vector
      • LabeledPoint
      • Rating
      • Matrices
        • Local Matrix
        • Distributed matrix
          • RowMatrix
          • IndexedRowMatrix
          • CoordinateMatrix
          • BlockMatrix
    • Comparing algorithms supported by MLlib
      • Classification
    • When and why should you use MLlib (versus scikit-learn versus TensorFlow versus foo package)
    • Pipeline
    • References
    • Linear algebra in Spark
  • Terminology
  • Machine Learning Steps
    • test
  • Preprocessing and Feature selection techniues
  • The importance of variables feature selection/attribute selection
    • Feature Selection
      • forward selection
      • mixed selection or bidirectional elimination
      • backward selection or backward elimination
      • The key metrics to focus on
  • Feature engineering
  • Hyperplanes
  • cross-validation
  • Machine learning losses
  • When to stop tuning machine learning models
  • Train, validation, and test data
  • input data structure
  • Why are matrices/vectors used in machine learning/data analysis?
    • Linear Algebra
  • OverView
  • Data scaling and normalization
  • Questions
  • Which machine learning algorithm should I use?
Powered by GitBook
On this page

Was this helpful?

Statistics

Statistics are numbers that summarize raw facts and figures in some meaningful way. They present key ideas that may not be immediately apparent by just looking at the raw data, and by data, we mean facts or figures from which we can draw conclusions.

As an example, you don’t have to wade through lots of football scores when all you want to know is the league position of your favorite team. You need a statistic to quickly give you the information you need.

But why learn statistics?

Understanding what’s really going on with statistics empowers you. If you really get statistics, you’ll be able to make objective decisions, make accurate predictions that seem inspired, and convey the message you want in the most effective way possible.

Stastics of Two Types.

Descriptive statistics - providing you with a description that illuminates some characteristic of sample data or subset

Inferential Staistics - inferential statistics provide information about a small data selection, so you can use this information to infer something about the larger dataset from which it was taken .

Explain with an example for age 18 to 34 that live in Philadelphia, Pennsylvania.

Descriptive statistics would allow you to understand the characteristics of the woman population that comprises this subset .

use inferential statistics with this dataset to make inferences about the larger population of women, who are 18 to 34 years old, but that are living in all cities in the state of Pennsylvania (and not just in Philadelphia).

PreviousStatistical fundamentals and terminologyNextMeasuring Central Tendency

Last updated 5 years ago

Was this helpful?