Machine Learning
  • Introduction
  • Self LEarning
  • Why Statstics in ML Or Data Science
  • How important is interpretability for a model in Machine Learning?
  • What are the most important machine learning techniques to master at this time?
  • Learning
    • Supervised Learning
      • Evaluating supervised learning
        • K-fold cross validation
        • Using train/test to prevent overfitting of a polynomial regression
      • Regression
        • Linear regression
          • The ordinary least squares technique
          • The gradient descent technique
          • The co-efficient of determination or r-squared
            • Computing r-squared
            • Interpreting r-squared
          • Assumptions of linear regression
          • Steps applied in linear regression modeling
          • Evaluation Metrics Linear Regression
          • p-value
        • Ridge regression
        • Least absolute shrinkage and selection operator (lasso) Regression
        • Polynomial regression
        • Performance Metrics
        • Regularization parameters in linear regression and ridge/lasso regression
        • Comments
      • Classification
        • test
        • Logistic Regression
        • naïve Bayes
        • support vector machines (SVM)
        • decision trees
          • Split Candidates
          • Stopping conditions
          • Parameters
            • Non Tunable Or Specificable
            • Tunable
            • Stopping Parameters
        • Evaluation Metrics
      • Random Forest
        • Logistic Regression Versus Random Forest
        • Paramters
          • Non Tunable Parameters
          • Tunable
          • Stopping Param
        • Parameter Comparison of Decision Trees and Random Forests
        • Classification and Regression Trees (CART)
        • How random forest works
        • Terminologies related to random forest algorithm
        • Out-of-Bag Error
      • Decision Trees
        • Gini Index
    • Unsupervised learning
      • Clustering
        • test
        • KMeans Clustering
          • Params
          • Functions
        • Gaussian Mixture
          • Parameters
          • functions
    • Semi-supervised learning
    • Reinforcement learning
    • Learning Means What
    • Goal
    • evaluation metrics
      • Regression
        • MSE And Root Mean Squared Error (RMSE)
        • Mean Absolute Error (MAE)
      • Model Validation
        • test
      • The bias, variance, and regularization properties
        • Regularization
          • Ridge regression
        • Bias And Variance
      • The key metrics to focus
    • hyperparameters
  • Steps in machine learning model development and deployment
  • Statistical fundamentals and terminology
  • Statistics
    • Measuring Central Tendency
    • Probability
    • Standard Deviation , Variance
    • root mean squared error (RMSE)
    • mean Absolute Error
    • explained Variance
    • Coefficient of determination R2
    • Standard Error
    • Random Variable
      • Discrete
      • Continuous
    • Sample vs Population
    • Normal Distribution
    • Z Score
    • Percentile
    • Skewness and Kurtosis
    • Co-variance vs Correlation
    • Confusion matrix
    • References
    • Types of data
      • Numerical data
        • Discrete data
        • Continuous data
      • Categorical data
      • Ordinal data
    • Bias versus variance trade-off
  • Spark MLib
    • Data Types
      • Vector
      • LabeledPoint
      • Rating
      • Matrices
        • Local Matrix
        • Distributed matrix
          • RowMatrix
          • IndexedRowMatrix
          • CoordinateMatrix
          • BlockMatrix
    • Comparing algorithms supported by MLlib
      • Classification
    • When and why should you use MLlib (versus scikit-learn versus TensorFlow versus foo package)
    • Pipeline
    • References
    • Linear algebra in Spark
  • Terminology
  • Machine Learning Steps
    • test
  • Preprocessing and Feature selection techniues
  • The importance of variables feature selection/attribute selection
    • Feature Selection
      • forward selection
      • mixed selection or bidirectional elimination
      • backward selection or backward elimination
      • The key metrics to focus on
  • Feature engineering
  • Hyperplanes
  • cross-validation
  • Machine learning losses
  • When to stop tuning machine learning models
  • Train, validation, and test data
  • input data structure
  • Why are matrices/vectors used in machine learning/data analysis?
    • Linear Algebra
  • OverView
  • Data scaling and normalization
  • Questions
  • Which machine learning algorithm should I use?
Powered by GitBook
On this page

Was this helpful?

  1. Learning

Supervised Learning

PreviousLearningNextEvaluating supervised learning

Last updated 5 years ago

Was this helpful?

Supervised learning entails learning a mapping between a set of input variables (typically a vector) and an output variable (also called the supervisory signal) and applying this mapping to predict the outputs for unseen data. Supervised methods attempt to discover the relationship between input variables and target variables. The relationship discovered is represented in a structure referred to as a model. Usually models describe and explain phenomena, which are hidden in the dataset and can be used for predicting the value of the target attribute knowing the values of the input attributes.

Supervised learning is the machine learning task of inferring a function from supervised

training data (set of training examples). The training data consists of a set of training

examples. In supervised learning, each example is a pair consisting of an input object and a

desired output value. A supervised learning algorithm analyzes the training data and

produces an inferred function.

From Statistics for Machine Learning (

Many supervised machine learning methods fall in to this category:

  • Classification problems

  • Logistic regression

  • Lasso and ridge regression

  • Decision trees (classification trees)

  • Bagging classifier

  • Random forest classifier

  • Boosting classifier (adaboost, gradient boost, and xgboost)

  • SVM classifier

  • Recommendation engine

  • Regression problems

  • Linear regression (lasso and ridge regression)

  • Decision trees (regression trees)

  • Bagging regressor

  • Random forest regressor

  • Boosting regressor - (adaboost, gradient boost, and xgboost)

  • SVM regressor

Some of the issues to consider in supervised learning are as follows:

  • Bias-variance trade-off

  • Function complexity and amount of training data

  • Dimensionality of the input space

  • Noise in the output values

  • Heterogeneity of the data

  • Redundancy in the data

  • Presence of interactions and non-linearity

Typically the problems will look like

These problems each contain a target of interest (Did the Titanic passenger survive? Did the customer churn? What’s the MPG?) and a set of training data with known values of the target. Indeed, most problems in machine learning are supervised in nature, and most ML techniques are designed to solve supervised problems.

Supervised Learning:

  • is like learning with a teacher

  • training dataset is like a teacher

  • the training dataset is used to train the machine

Example:

Classification:Machine is trained to classify something into some class.

  • classifying whether a patient has disease or not

  • classifying whether an email is spam or not

Regression:Machine is trained to predict some value like price, weight or height.

  • predicting house/property price

  • predicting stock market price

https://www.safaribooksonline.com/library/view/statistics-for-machine/9781788295758/94fde0ee-fc9e-4cbc-aac4-d8dc1d9d94f4.xhtml)\