Introduction to Logistic Regression
Logistic Regression in Machine Learning is a statistical method for predicting results in discrete values. Logistic Regression is mainly used for classification problems. The output of a Logistic regression model is a probability. It uses the variables in a categorical way, not continuous. i.e., High or low, 0 Or 1, True or False, Yes or No.
It is known as logistic regression because it uses a logistic function known as the Sigmoid function.
Example of Logistic Regression: Predicting gender based on some characteristics.
Sigmoid Function:
In logistic regression sigmoid function or logistic function is used to squeeze the output of a linear equation between 0 and 1. If the values are positive it gives 1 and if it is negative gives 0.
The equation of Logistic regression from the linear equation is:
Threshold value:
The threshold value indicates the probability of the event to happen or not. Logistic regression chooses the class that has the biggest probability. In the case of 2 classes, the threshold is 0.5. A value above the threshold indicates 1, a value below indicates 0. Thresholds are problem-dependent and are therefore values that you must tune.
If the probability is greater than the threshold value then the event is predicted to happen.
If the probability is lesser than the threshold value then the event is predicted not to happen.
Difference between Linear regression and Logistic regression
LINEAR REGRESSION |
LOGISTIC REGRESSION |
In Linear Regression, the predicted variable is continuous. |
In Logistic Regression, the predicted variable is categorical. |
It is used for Regression problems | It is used for Classification problems |
The graph is a straight line | The graph is an S curve or sigmoid function |
Example: Weather forecasting.
Linear regression predicts the temperature. |
Example: Weather forecasting.
Logistic regression predicts whether it will rain or not. |
Types of Logistic Regression:
- Binary Logistic Regression – The data having only two possibilities. Example: 0 or 1
- Ordinal Logistic Regression – The data having more than two categories in an ordered way. Example: Movie rating from 0 to 10.
- Multinomial Logistic Regression – The data having more than two possibilities in an unordered way. Example: Eye color like blue, brown, black, etc.,
Advantages of Logistic Regression:
- Logistic Regression is easy to implement and interpret.
- It gives good accuracy and performs well in linearly separable data.
- Logistic Regression is less inclined to overfitting, regularization can be used to avoid overfitting of data.
Disadvantages of Logistic Regression:
- Non-linear problems can’t be solved with logistic regression because its decision boundary is linear.
- Since its outcome is discrete, Logistic Regression can only predict a categorical outcome.
Examples of Logistic Regression in Machine Learning:
- Credit Card Fraud Detection (Predicting whether the credit card transaction is fraud or not)
- Email Spam detection (Predicting whether the email is spam or not)
- Customer Loan Prediction (Predicting whether the customer will take the loan or not)
- Tumour detection (Identify whether a tumor is malignant or benign)