Machine Learning
  • Introduction
  • Self LEarning
  • Why Statstics in ML Or Data Science
  • How important is interpretability for a model in Machine Learning?
  • What are the most important machine learning techniques to master at this time?
  • Learning
    • Supervised Learning
      • Evaluating supervised learning
        • K-fold cross validation
        • Using train/test to prevent overfitting of a polynomial regression
      • Regression
        • Linear regression
          • The ordinary least squares technique
          • The gradient descent technique
          • The co-efficient of determination or r-squared
            • Computing r-squared
            • Interpreting r-squared
          • Assumptions of linear regression
          • Steps applied in linear regression modeling
          • Evaluation Metrics Linear Regression
          • p-value
        • Ridge regression
        • Least absolute shrinkage and selection operator (lasso) Regression
        • Polynomial regression
        • Performance Metrics
        • Regularization parameters in linear regression and ridge/lasso regression
        • Comments
      • Classification
        • test
        • Logistic Regression
        • naïve Bayes
        • support vector machines (SVM)
        • decision trees
          • Split Candidates
          • Stopping conditions
          • Parameters
            • Non Tunable Or Specificable
            • Tunable
            • Stopping Parameters
        • Evaluation Metrics
      • Random Forest
        • Logistic Regression Versus Random Forest
        • Paramters
          • Non Tunable Parameters
          • Tunable
          • Stopping Param
        • Parameter Comparison of Decision Trees and Random Forests
        • Classification and Regression Trees (CART)
        • How random forest works
        • Terminologies related to random forest algorithm
        • Out-of-Bag Error
      • Decision Trees
        • Gini Index
    • Unsupervised learning
      • Clustering
        • test
        • KMeans Clustering
          • Params
          • Functions
        • Gaussian Mixture
          • Parameters
          • functions
    • Semi-supervised learning
    • Reinforcement learning
    • Learning Means What
    • Goal
    • evaluation metrics
      • Regression
        • MSE And Root Mean Squared Error (RMSE)
        • Mean Absolute Error (MAE)
      • Model Validation
        • test
      • The bias, variance, and regularization properties
        • Regularization
          • Ridge regression
        • Bias And Variance
      • The key metrics to focus
    • hyperparameters
  • Steps in machine learning model development and deployment
  • Statistical fundamentals and terminology
  • Statistics
    • Measuring Central Tendency
    • Probability
    • Standard Deviation , Variance
    • root mean squared error (RMSE)
    • mean Absolute Error
    • explained Variance
    • Coefficient of determination R2
    • Standard Error
    • Random Variable
      • Discrete
      • Continuous
    • Sample vs Population
    • Normal Distribution
    • Z Score
    • Percentile
    • Skewness and Kurtosis
    • Co-variance vs Correlation
    • Confusion matrix
    • References
    • Types of data
      • Numerical data
        • Discrete data
        • Continuous data
      • Categorical data
      • Ordinal data
    • Bias versus variance trade-off
  • Spark MLib
    • Data Types
      • Vector
      • LabeledPoint
      • Rating
      • Matrices
        • Local Matrix
        • Distributed matrix
          • RowMatrix
          • IndexedRowMatrix
          • CoordinateMatrix
          • BlockMatrix
    • Comparing algorithms supported by MLlib
      • Classification
    • When and why should you use MLlib (versus scikit-learn versus TensorFlow versus foo package)
    • Pipeline
    • References
    • Linear algebra in Spark
  • Terminology
  • Machine Learning Steps
    • test
  • Preprocessing and Feature selection techniues
  • The importance of variables feature selection/attribute selection
    • Feature Selection
      • forward selection
      • mixed selection or bidirectional elimination
      • backward selection or backward elimination
      • The key metrics to focus on
  • Feature engineering
  • Hyperplanes
  • cross-validation
  • Machine learning losses
  • When to stop tuning machine learning models
  • Train, validation, and test data
  • input data structure
  • Why are matrices/vectors used in machine learning/data analysis?
    • Linear Algebra
  • OverView
  • Data scaling and normalization
  • Questions
  • Which machine learning algorithm should I use?
Powered by GitBook
On this page

Was this helpful?

  1. The importance of variables feature selection/attribute selection
  2. Feature Selection

forward selection

PreviousFeature SelectionNextmixed selection or bidirectional elimination

Last updated 5 years ago

Was this helpful?

One such method is forward selection, which is an example of stepwise regression that performs feature selection in a series of steps. With forward selection, the idea is to start out with an empty model that has no features selected. We then perform k simple linear regressions (one for every feature that we have) and pick the best one. Here, we are comparing models that have the same number of features so that we can use the R2 statistic to guide our choice, although we can use metrics such as AIC as well. Once we have chosen our first feature to add, we then pick another feature to add from the remaining k-1 features. Therefore, we now run k-1 multiple regressions for every possible pair of features, where one of the features in the pair is the feature that we picked in the first step. We continue adding in features like this until we have evaluated the model with all the features included and stop. Note that, in every step, we make a hard choice about which feature to include for all future steps.

For example, models that have more than one feature in them and do not include the feature we chose in the first step of this process are never considered. Therefore, we do not exhaustively search our space. In fact, if we take into account that we also assess the null model, we can compute the total number of models we perform a linear regression on as follows:

The order of magnitude of this computation is on the scale of k2, which for even small values of k is already considerably less than 2k. At the end of the forward selection process, we have to choose between k+1 models, corresponding to the subsets we obtained at the end of every step of the process. As the final part of the process involves comparing models with different numbers of features, we usually use a criterion such as the AIC or the adjusted R2 to make our final choice of model. We can demonstrate this process for our CPU dataset by running the following commands:

There are various methods to add or remove variables to determine the best possible model.

In the case of forward, we will start with no variables and keep on adding significant variables until the overall model's fit improves.

In the backward method, iterations start with considering all the variables and we will remove variables one by one until all the prescribed statistics are met (such as no insignificance and multi-collinearity, and so on ). Finally, the overall statistic will be checked, such as if R-squared value is > 0.7 , it is considered a good model, else reject it. In industry, practitioners mainly prefer to work on backward methods.

https://www.safaribooksonline.com/library/view/statistics-for-machine/9781788295758/bdd4260c-5a10-4db7-b195-ca13d97b9d0a.xhtml
Assumptions of linear regression
https://www.safaribooksonline.com/library/view/mastering-predictive-analytics/9781787121393/ch03s06.html