We are living in an age where machine learning participates in every activities of daily life. From email filtering to process automation, machines are “trained” to become smarter and more personal.
First of all, Machine Learning Algorithms can be categorized by the following three types. Let’s categorize Machine Learning Algorithm into subparts and see what each of them are, how they work, and how each one of them is used in real life.
If you are a data scientist or machine enthusiast, you definitely used one of algorithms below. It helps you build a model to make predictions or decisions, without being explicitly programmed to do so. Here is the list of 5 most commonly used machine learning algorithms.
This is a supervised machine learning algorithm where the predicted output is continous and has a constant slope. It’s used to predict values within a continous range rather than trying to classify them into categories. This best fit line is known as the regression line and represented by a linear equation Y= aX + b.
#Load Train and Test datasets #Identify feature and response variable(s) and values must be numeric and numpy arrays x_train <- input_variables_values_training_datasets y_train <- target_variables_values_training_datasets x_test <- input_variables_values_test_datasets x <- cbind(x_train,y_train) # Train the model using the training sets and check score linear <- lm(y_train ~ ., data = x) summary(linear) #Predict Output predicted= predict(linear,x_test)
By using the above equation, now you can find the height, knowing the height of a person without asking them.
2. Logistic Regression
It is a classification, and not a regression algorithm. It is used to estimate discrete values ( Binary values like 0/1, yes/no, true/false ) based on a given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, it is also known as logit regression. Since it predicts the probability, its output values lie between 0 and 1.
x <- cbind(x_train,y_train) # Train the model using the training sets and check score logistic <- glm(y_train ~ ., data = x,family='binomial') summary(logistic) #Predict Output predicted= predict(logistic,x_test)
It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets. This is done based on the most significant attributes/ independent variables to make as distinct groups as possible.
library(rpart) x <- cbind(x_train,y_train) # grow tree fit <- rpart(y_train ~ ., data = x,method="class") summary(fit) #Predict Output predicted= predict(fit,x_test)
This is a classification technique based on Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
library(e1071) x <- cbind(x_train,y_train) # Fitting model fit <-naiveBayes(y_train ~ ., data = x) summary(fit) #Predict Output predicted= predict(fit,x_test)
It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. KNN can easily be mapped to our real lives. If you want to learn about a person, of whom you have no information, you might like to find out about his close friends and the circles he moves in and gain access to his/her information!
library(knn) x <- cbind(x_train,y_train) # Fitting model fit <-knn(y_train ~ ., data = x,k=5) summary(fit) #Predict Output predicted= predict(fit,x_test)